After successfully updating my Ubuntu 10.04.02 64-bit VM with the usual:

sudo apt-get update && sudo apt-get upgrade

The message of the day (MOTD) did not reset the number the number of packages that required updating. Mine was stuck on:

30 packages can be updated.
23 updates are security updates.

To remedy this, clear out /etc/motd.tail with this command:

cat /dev/null > /etc/motd.tail

When you login again, all will be right with your MOTD.

On my MacBook Air, the Ubuntu 10.04 VMWare guest OS was losing time whenever the Air went to sleep. To keep time synchronized between the host and guest, install VMWare Tools AND enable time synchronization by executing sudo /usr/bin/vmware-toolbox. Make sure the following checkbox is selected.
vmware tools properties dialog

Since its purchase, my primary Windows 7 laptop has suffered from random bluetooth disconnects to both the mouse and keyboard. Since the disconnects are almost always simultaneous, I suspected a problem in either the laptop’s bluetooth module or the software driver.

After the latest frustrating disconnect, I once again sought an answer from the wisdom of Google. This time, however, my queries proved fruitful. Disabling the power management features of the bluetooth module as detailed in this article solved my bluetooth issues.

Reloading your development database from a copy on the staging or production server can be done with a single command. With the magic of ssh, key pair authentication, and .my.cnf files on the server and your development box, you can execute the following:

ssh -C username@remotehost.com 'mysqldump myapp_staging' | mysql myapp_development

Let’s examine the entire command in pieces. ssh followed by a command allows you to execute that remote command and have its output sent to the originating server’s stdout. mysqldump myapp_staging is executed on the remote server and the -C flag instructs ssh to compress the data from the remote server. Then, mysql executes locally to run the sql statements on the myapp_development database. Voilà, you have reloaded your development database with a copy from the staging server with a single command.

Apparently, this only works on the OS X and Linux versions of Skype. If you want to edit the last chat message you sent, you can use this syntax:

s/mistaek/mistake/

Skype will then update the last message you sent. However, the recipient will see a message indicating that it was edited. This is equivalent to right-clicking on the message and selecting Edit. However, for someone who uses vim all day, this Skype feature is awesome.

I’m not sure why this setting is not enabled by default, but every MySQL installation should have this line in the [mysqld] section of its my.cnf file:

innodb_file_per_table

This setting instructs MySQL to create a separate file for each innodb table’s data. By default, MySQL stores all innodb data (from all databases) in a single file. Now recall that when you delete data from an innodb table, the actual disk storage is not recovered. Instead, MySQL marks the storage region so that when new data is added, that region can be re-used.  That is a sensible strategy, but fails miserably when you need to delete a lot data and recover the corresponding disk storage.

With a single file for all innodb data, recovering disk storage amounts to backing up the database, dropping and re-creating the database, and then reloading the database from the backup.  In the process, you have to stop your server and remove the /var/lib/mysql/ibdata1 file.  This is rarely a viable strategy.  If you are in desperate need to recover disk space on your server, it’s doubtful that you have room to store the database backup. Moreover, the backup and restore process is far from instantaneous when you’re dealing with hundreds of gigabytes of data. This all amounts to a lot of downtime.

Do things right from the start.  Add innodb_file_per_table to your my.cnf file.  You will have to restart MySQL for this change to take effect. Now when you want to free up disk space, you can work at the granularity of a single table instead of all databases utilizing innodb.

For example, if you want to clear out the majority of a fictitious table named really_big_table, start by running an INSERT SELECT statement to store a copy of the data you want to keep into a temporary table. Then DROP really_big_table and re-CREATE it. Now load the data from the temporary table back into really_big_table via another INSERT SELECT. Your disk storage has been recovered.

Once you have your development environment setup as a VMware VM, you will want to access it with applications running on your host OS.  For example, you want to test your new shiny Ruby on Rails web application (running in your VM) with Internet Explorer on Windows or Safari on OS X

There are three ways to configure the network connection for your VM:

  • host-only
  • bridged
  • NAT

Host-only is not interesting for web applications that interact with APIs and services on the internet.

With a bridged connection and DHCP, your VM will have a different IP address as you go from home to office to coffee shop.  It becomes annoying to open a terminal in your VM, run /sbin/ifconfig, copy the IP address, and then go back to the host OS and adjust your hosts file accordingly.  Do this a few times and you will clamor for a better way.

Enter NAT and a static IP address for your VM.  You will have a consistent IP address no matter which network your host is on. Enter this IP address into your hosts file once and you’re done.  Let’s walk through a real life example.

You have 3 in-progress client projects.  You have configured your development VM with the same stack that’s powering your staging and production servers.   In this example, you have a Ruby on Rails application running apache, Ruby Enterprise Edition, Phusion Passenger, and MySQL.  You configure each project with  its own VirtualHost : project1.dev, project2.dev, and project3.dev.  On your host OS, you have a host file entry like this:

192.168.154.100 project1.dev project2.dev project3.dev

Now whether you’re at home, at the office, or sipping some chai tea at a coffee shop, you can access your client projects the same way from a browser on the host OS: http://project1.dev, http://project2.dev, and http://project3.dev.

This seems simple enough, but it turns out that defining a static IP address in your guest Ubuntu VM and having that IP address accessible from the host is not trivial.  Start by running ipconfig (Windows) or ifconfig (OS X) on the host OS and look for the IPv4 Address for the VMnet8 adapter.  On my Windows machine, that address is 192.168.154.1.  On my OS X machine, that address is 172.16.252.1.

Now in your Ubuntu VM, configure it to use a static IP address.  This example uses 192.168.154 for the first three octets.  Substitute the first 3 octet values corresponding to your VMnet8 adapter.  In Ubuntu, System -> Preferences -> Network Connections.  On the Wired tab, select the network adapter (usually Auto eth1) and click on Edit.  Select the IPv4 Settings tab and enter/select these values:

  • Method: Manual
  • IP address: 192.168.154.100
  • Netmask: 255.255.255.0
  • Gateway: 192.168.154.2
  • DNS Servers: 8.8.8.8, 8.8.4.4

192.168.154.100 was chosen for the IP address because 100 is a) easy to remember and b) far enough away from the reserved low single digits to avoid conflicts.  The gateway value was the tricky one.  Most references instruct you to use the IP address, but substitute 1 for the final octet.  However, that IP address was already being used on the host for the VMnet8 adapter.  After much trial and error, it turned out that 192.168.154.2 was the ticket.  The DNS servers are Google’s.

After making the settings, restart your networking with /etc/init.d/networking restart.

For many years, I developed software (mostly LAMP and Ruby on Rails) on a MacBook and MacBook Pro. It was always a struggle to have the right stack installed for various projects.  Mac Ports became my weapon of choice. Even then, there were enough differences between my development, staging, and production environments to cause unexpected problems. Managing different versions of databases, libraries, etc. was never simple.

At the beginning of 2010, I was fed up with Steve Job’s reality distortion field and defected back to the PC camp. Greeting me with open arms was Windows 7, an OS that I continue to praise. However, I longed for my bash prompt and open source stack.

The solution was to run a VMware VM with Ubuntu as the guest operating system. This solved many problems.

  • My development machine became consistent with my staging and production servers.
  • I can have a different VM for projects requiring different stacks. No more trying to make my laptop a superset of all environments.
  • When bringing on additional developers to the project, I simply hand them a copy of the VM. Voilà. Their entire development environment is ready to go. No more wasting an entire day on configuring your laptop.
  • I chose VMware because it is available on Windows, OS X, and Linux and the same VM can run unaltered on all three platforms.
  • VMware offers snapshots, so you can easily roll back to a known state if you really mess things up.

Now that it’s 2011, a client project requiring OS X has forced me back into Apple’s extortion pricing. This time, however, I will be using the VM solution. I highly recommend it.

I recently completed an optimization pass for a website that does reporting for its clients. A number of reports took more than 30 seconds to compute. As the data grew, performance continued to degrade. The main table in question consisted of tens of millions of rows.

Here was the strategy I employed:

  1. De-normalize the data
  2. Partition the tables
  3. Optimize the indexes on the table
  4. Optimize the queries

Step 1 is to avoid joins. Joins are expensive operations. If you can put all the necessary fields in a single table, your queries will also be much simpler and the database won’t have to perform complex operations. Of course, nothing comes for fee. The cost is duplication of data. While space will rarely be your limiting factor, it’s important to take steps to avoid data inconsistencies.

Step 2 serves to divide your large table into smaller ones. One huge advantage is that the indexes for any of the smaller tables will be much smaller. If a table’s index can be loaded entirely into memory, you will notice huge speed improvements. For this particular application, the database is Postgres and I was able to utilize the inheritance feature to implement partitioning.

Steps 3 and 4 really go hand-in-hand. In an existing application, you have the luxury of examining the slow query logs to determine which queries to focus on. On Postgres, I recommend using pgFouine. As optimization is a never-ending pursuit, these two steps can take as little or as much time as you have. Set realistic goals for your optimization efforts. With Postgres, the explain analyze command will yield vast amounts of data about your queries and the indexes. Use them often.

This optimization effort allows reports that took in excess of 30 seconds to now return in under a second.

git is the current hotness in source control. One of its main strengths is cheap and easy branching and merging. As a result, git users tend to use branches often. This is a good thing. However, when working on multiple repositories with multiple branches, it can be easy to lose track of your current branch.

While a simple git status will yield the current branch, it is often convenient to have the current branch displayed in your prompt. The magical incantation to add to your PS1 variable is $(__git_ps1). This function is defined in /etc/bash_completion.d/git. If you don’t have access to that file, you can find it in contrib/completion/git-completion.bash after cloning this repo:

git://git.kernel.org/pub/scm/git/git.git

Here is a snippet from a .bashrc file for displaying the current git branch in your bash prompt:

source /etc/bash_completion.d/git
export PS1='\w$(__git_ps1 "(%s)") > '

© 2011 Technically Speaking Suffusion theme by Sayontan Sinha