spamassassin-users June 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: High Performance Bayes Database Configur

Re: High Performance Bayes Database Configuration?

From: Yet Another Ninja <axb.lists_at_nospam>
Date: Tue Jun 21 2011 - 14:44:36 GMT

On 2011-06-21 16:30, Marc Perkel wrote:
>
>
> On 6/21/2011 7:23 AM, David F. Skoll wrote:
>> On Tue, 21 Jun 2011 07:06:11 -0700
>> Marc Perkel<support@junkemailfilter.com> wrote:
>>
>>> Trying to get MySQL bays working in a high volume environment.
>>> Dedicated MySQL server with SSD drives. Can someone send me a sample
>>> my.cnf file and make other suggestings to keep it running wihout
>>> database corruption and other MySQL "features"? Or - should I be
>>> using some other DB?
>> We've tried various ways of storing Bayes data (we have our own Bayes
>> implementation, so this discussion may not correspond exactly with the
>> SA implementation.) After trying Berkeley DB files and PostgreSQL---we
>> would never use MySQL for any data we care about---we finally settled
>> on Dan Bernstein's CDB format. It has by far the best performance.
>> See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
>> Take a look at the "Random Reads" timings. CDB is 6 times faster than
>> Berkeley DB!
>>
>> CDB is read-only, which means when you want to do Bayes training, you
>> have to rewrite the entire database. This is not an issue for our
>> system because of how we do Bayes training, but it may be an issue
>> with the standard sa-learn.
> Thanks David but I need real time updating and it's spread across
> multiple servers. So need PostgreSQL or MySQL.

I settled with /server SDBM
Under high traffic, MySQL produced too much lag - no matter how fast the
DB server was.