Migrating a Mail Archive from Mbox to IMAP
It’s been several years that I store all my email in Kmail. It used to be Outlook back in 2000, but I soon got fed up and converted to a Linux box where the .pst format was quite useless. I used a nifty utility to do the conversion on a Windows box, and everything worked just fine.
Now it’s time to move again. This server has a new IMAP implementation that polls my POP3 provider every minute, performs filtering, then allows me to connect from anywhere. That’s very usable, especially considering that I always run the email client at home and it would be quite annoying not to be able to read home email during the day.
Problem: None of the current email clients know how to move a repository to IMAP. Oddly enough, Outlook can do it no problem, while the open source world entirely ignores the problem. That’s quite silly, since a conversion utility is quite easy to write.
1. Convert Maildir to Mbox
I had chosen a very unfortunate setup in which maildir and mbox folders were mixed in colorful harmony. I didn’t like trying to convert a promiscuous system, so I set out to convert from one format to the other.
NOTE: This step is entirely optional, as the routines used later to convert from Mbox to IMAP work in principle just as well with maildir folders.
There are several utilities to convert files from the Maildir to the Mbox format. There are many, many more that convert from Mbox to Maildir, and that might have been the better option; but I stuck with the original idea.
The script can be found here. maildir2mbox works very well, with one major caveat: it creates all mboxes in the root mail folder instead of the original location. This is annoying because if you happen to have two maildir folders in different locations, one will convert on top of the other and you lose all information. Additionally, you’d rather have the mbox in the original location, anyway.
I slightly modified the script to do as above, which was quite simple.
2. Creating the Folder Structure
After converting everything to mbox, your folder structure will look a little odd. Each mbox is a file with the name of the folder. Subfolders are located in a folder named after the main folder, but with a single dot ‘.’ in front and the extension ‘.directory’. So if you are looking for the folder ‘SUB’ contained in the folder ‘ROOT’, you’ll find a file ‘ROOT’ and a file ‘.ROOT.directory/SUB’.
To be able to create a directory structure, you have to read in all the directory names and munge them to remove the initial ‘.’ and the extension.
Then there is the problem that IMAP uses dots ‘.’ to separate folders, and most IMAP servers correspondingly don’t allow you to have dots in a folder name. Needed to regsub those away, too.
I wrote a little utility that creates a list of folders given a mail store. Then it performs the substitutions above, and everybody’s happy.
3. Converting the Mbox Files
Here things became a little diceyer. There are several languages that allow for that kind of manipulation, but Perl seemed to be the natural fit. By the time I started looking at it, I figured out it had all the necessary handling for maildir and mailbox files, as well as everything needed to generate the folder structure, especially the regular expressions.
I had everything converted already, but I knew I was going to be able to do better next time. So I went and devised a scheme that would allow anyone else to start from scratch, but me to start with my unfinished product.
You will need the following CPAN packages:
- Mail::IMAPClient to generate the folders and add the messages
- Mail::Box::Mbox to read the messages from the mbox files
- Mail::Message::Convert::MailInternet to get the Mail::Message objects into RFC 822 format
- File::Find to make it easy to download the whole thing
Here is the script for download.
4. Converting Filters
This one stumped me for a while. I had a collection of about 100 filters that I use for two main purposes:
- Move emails to destination folder for archival before I have seen them
- Move emails to destination folder after I have seen them
This is no trivial difference. In the former category are all the emails that deal with mailing lists I am not currently interested in. In the latter all other emails.
Oddly enough, Kmail doesn’t apply filters to IMAP email. That suits me just well, since I wanted server-side filtering to be able to access my relevant email remotely without having to go through a ton of junk.
Now the good news is that this is about the most frequently heard complaint about Kmail/IMAP, so that it’s a high priority for the next KDE release (3.5). Converting the filters automatically is not efficient, because of the nature of the two systems.
Hence I came up with a different path: I figured I had filters for the following occasions:
- I got an email from a mailing list, like Perl-Beginners; file it to a folder
- I got an email from a vendor, like Amazon; dito