spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Bayes and MySQL - does it actually work?

Re: Bayes and MySQL - does it actually work?

From: David F. Skoll <dfs_at_nospam>
Date: Fri Dec 23 2011 - 14:25:00 GMT

I don't believe any kind of SQL database is the best choice for Bayes
(which involves simple keyed lookups). We use Dan Bernsteins "cdb"
file format with great success. Each user has his or her own CDB file
as well as a sitewide file containing 5.7 million tokens.

The CDB software uses mmap() to map the CDB file into memory. As long
as your server has lots of memory, the OS's memory management system
keeps heavily-used CDB files in memory... no arcane tuning required.
[Actually, this is the key for any kind of fast Bayes lookup: Build a
server with huge gobs of memory. :)]

I realize SpamAssassin does not use CDB files for Bayes. But if the
developers are looking for a new back-end, I highly recommend CDB
for its excellent performance.

The only downside to CDB is that incremental updates are not possible.
To train, you need to rebuild the entire CDB file. For us, that's
an acceptable tradeoff, but YMMV.