StokeBloke.com

GNU java mail

Heres how I started to use GNU java mail to read mbox files.

Download archives

See http://www.gnu.org/software/classpathx/javamail/javamail.html#download

Extract the archives.

Build activation

cd activation-1.1.1
./configure &&  make && make javadoc

Build inetlib

cd inetlib-1.1.1
./configure &&  make && make javadoc

Build mail

First I copied the two dependant jars into the mail directory. This makes it easy to know what version was used for the build.

cp activation-1.1.1/activation.jar mail-1.1.2/
cp inetlib-1.1.1/netlib.jar mail-1.1.2/
cd mail-1.1.2/
./configure --with-activation-jar=./ --with-inetlib-jar=./ make && make javadoc

Using it

final Properties properties = new Properties();
properties.put("mail.mbox.mailhome", "/home/nwightma/java/mboxparser/");
properties.put("mail.mbox.inbox", "");

final Session session = Session.getInstance(properties);
// protocol=mbox;
// type=store;
// class=gnu.mail.providers.mbox.MboxStore;
// vendor=dog@gnu.org;
session.addProvider(new Provider(Provider.Type.STORE, "mbox",
"gnu.mail.providers.mbox.MboxStore", "dog@gnu.org", "1"));

final Store store = session.getStore("mbox");
if (store instanceof MboxStore) {
  store.connect();

  final Folder folder = store.getFolder("test.mbox");
  folder.open(Folder.READ_ONLY);
  System.out.println(folder.getFullName());
  //final int[] msgs =  { 0 };
  //System.out.println(folder.getMessages(msgs));
}

Opinion

Well firstly the mbox gnu mail stuff does work. Just don’t attempt to use it on any large mbox files.

In my case when I tried to open an mbox with 3597 emails it took over 6 minutes just to return the message count. It appears that the code loads all the messages into to memory so if you only want to process a couple you have to pay the hit to load them all.

Finding number of messages with grep take…

time grep -c "^From " test.mbox
3597

real	0m0.542s
user	0m0.403s
sys	0m0.138s

It appears that the folder.open(Folder.READ_ONLY) is the part which takes forever. Well to be exact the open call takes 359.952 seconds ~ 6 mins. After that thought the calls to get messages is very fast. I believe its loading the complete mbox into memory when you open the folder.

I believe that the mbox provider from gnu mail is fine if you want to process every part of the mbox, but if you want to use just some messages or some parts of these the overhead is simply not worth it.

Leave a Reply