Monday, August 3, 2009

Imported All Legacy Email into Gmail

I switched to Gmail in 2006 but still had my earlier legacy email in three different places:
  • Maildir format (Courier server) since about 1998 (about 2 GB)
  • Netscape mbox format since about 1996
  • MH format from 1987 to 1996
I am just about done with the conversion. It required a tedious semi-automated process that took several days. I basically followed these steps:
  1. Convert MH mail to mbox using this script.
  2. Convert resulting mail and Netscape mbox mail to Maildir using the newest version of this script.
  3. Filtering out all messages already in Gmail, for each set of desired Gmail labels, import the corresponding Maildir folders using Scott Yang's excellent IMAP-based maildir2gmail.py tool, then apply the desired labels to the imported messages, finally add these labels to the filter. Repeat until done.
There was one minor complication: A colleague of mine used a misconfigured client that caused the year in his messages to appear as 100, 101, etc. instead of 2000, 2001, etc. maildir2gmail.py couldn't handle those messages, and I was too lazy to figure out how to fix them with sed, so I simply deleted them. I probably have most of their content quoted in my replies anyway.

My recommendation is not to bother with these options:
  • An email GUI client won't be able to handle anything but tiny volumes of mail reliably.
  • Gmail Loader uses SMTP and seems to mess up the original message headers. Besides, it seemed very slow.
Now I have over 90,000 messages taking up almost 4 GB, a bit more than half of the current limit.