GNU java mail
Heres how I started to use GNU java mail to read mbox files.
Download archives
See http://www.gnu.org/software/classpathx/javamail/javamail.html#download
- http://ftp.gnu.org/gnu/classpathx/mail-1.1.2.tar.gz
- http://ftp.gnu.org/gnu/classpathx/activation-1.1.1.tar.gz
- http://ftp.gnu.org/gnu/classpath/inetlib-1.1.1.tar.gz
Extract the archives.
Build activation
cd activation-1.1.1 ./configure && make && make javadoc
Build inetlib
cd inetlib-1.1.1 ./configure && make && make javadoc
Build mail
First I copied the two dependant jars into the mail directory. This makes it easy to know what version was used for the build.
cp activation-1.1.1/activation.jar mail-1.1.2/ cp inetlib-1.1.1/netlib.jar mail-1.1.2/
cd mail-1.1.2/ ./configure --with-activation-jar=./ --with-inetlib-jar=./ make && make javadoc
Using it
final Properties properties = new Properties(); properties.put("mail.mbox.mailhome", "/home/nwightma/java/mboxparser/"); properties.put("mail.mbox.inbox", ""); final Session session = Session.getInstance(properties); // protocol=mbox; // type=store; // class=gnu.mail.providers.mbox.MboxStore; // vendor=dog@gnu.org; session.addProvider(new Provider(Provider.Type.STORE, "mbox", "gnu.mail.providers.mbox.MboxStore", "dog@gnu.org", "1")); final Store store = session.getStore("mbox"); if (store instanceof MboxStore) { store.connect(); final Folder folder = store.getFolder("test.mbox"); folder.open(Folder.READ_ONLY); System.out.println(folder.getFullName()); //final int[] msgs = { 0 }; //System.out.println(folder.getMessages(msgs)); }
Opinion
Well firstly the mbox gnu mail stuff does work. Just don’t attempt to use it on any large mbox files.
In my case when I tried to open an mbox with 3597 emails it took over 6 minutes just to return the message count. It appears that the code loads all the messages into to memory so if you only want to process a couple you have to pay the hit to load them all.
Finding number of messages with grep take…
time grep -c "^From " test.mbox 3597 real 0m0.542s user 0m0.403s sys 0m0.138s
It appears that the folder.open(Folder.READ_ONLY)
is the part which takes forever. Well to be exact the open call takes 359.952 seconds ~ 6 mins. After that thought the calls to get messages is very fast. I believe its loading the complete mbox into memory when you open the folder.
I believe that the mbox provider from gnu mail is fine if you want to process every part of the mbox, but if you want to use just some messages or some parts of these the overhead is simply not worth it.