spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Bayes and MySQL - does it actually work?

Re: Bayes and MySQL - does it actually work?

From: Henrik K <hege_at_nospam>
Date: Fri Dec 23 2011 - 11:29:00 GMT
To: users@spamassassin.apache.org

On Wed, Dec 21, 2011 at 01:10:27PM -0500, Kris Deugau wrote:
> Marc Perkel wrote:
> >I've been trying for a long time to get bayes/mysql to actually work.
> >Running a dedicated server with MySQL. Several servers running SA
> >configured to talk to it.
> >
> >I'm running big servers with lots of ram and raid 0 flash drives for
> >speed. Also using InnoDB. I'm beginning to wonder if it is ever going to
> >work and if someone is going to fix it?
>
> I'm not sure what official testing has been done, but some testing I
> did about a year ago when upgrading the SA cluster here showed
> pretty much the same IO load for a global Bayes no matter what
> combination of MyISAM, InnoDB, generic SQL, or MySQL-specific SA
> modules I used.
>
> Enabling MySQL replication also bogged things down pretty badly.
>
> Performance with the database on physical disks simply wasn't
> keeping up with more than about double the average message rate (if
> that...), so I fell back to the "good enough" setup of putting the
> SA database on a RAMdisk, and tweaking the MySQL init script to
> reload the database on startup. A database dump is done once a day,
> about a half-hour after a Bayes expiry run.
>
> This is handling ~250K messages/day, although with some tweaks to
> serialize mail delivery a little more to level off the extreme peaks
> in messages/second it should probably be able to handle a lot more
> volume.

I guess it still boils down to basics. No matter what the database server is
used for, same principles apply. If you have slooow disks, then things are
going to be slow.

Ideally you should compile newest MySQL by hand. Older versions don't use
the new faster InnoDB Plugin codebase.

Disk / fsync() is almost always the bottleneck. If you don't have critical
stuff in the same database, look at all the relevant options
(innodb_flush_log_at_trx_commit=0, sync_binlog=0 etc). You could even run
separate instance for SA only with all the fastest options. Probably some
similar options for replication exist (speed vs reliability), no experience
with that.

Also you can tune the default schema. Drop atime index, it's pointless when
using manual expiry. If you have simple global bayes, change "id" column to
tinyint, it will cut your database size in half. I've also changed
spam_count and ham_count to smallint, since I don't have that much traffic.

Since these issues pop up here every now and then, I guess SA needs own
tutorial/howto for MySQL tuning..