spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: Bayes and MySQL - does it actually work?

Re: Bayes and MySQL - does it actually work?

From: Henrik K <hege_at_nospam>
Date: Fri Dec 23 2011 - 15:15:09 GMT
To: users@spamassassin.apache.org

On Fri, Dec 23, 2011 at 03:03:09PM +0000, spamassassin@lists.grepular.com wrote:
> On 23/12/11 14:20, Henrik K wrote:
>
> >> As I understand it, if the MySQL query cache is tuned appropriately,
> >> then most of the queries should not be touching disk anyway?
> >
> > Enabling query cache will probably (marginally) slow things down. Bayes
> > queries are extremely random, so there's nothing to cache. Any write to the
> > table will invalidate caches anyway. And those writes happen every time a
> > token is read (atime is updated).
>
> To stop the query cache being invalidated, it would probably be better
> if the writes were queued and then done in batches. Can SpamAssassin
> handle this sort of queue internally, or would some sort of additional
> technology be required?

You need to consider that tokens are done in batches of 50 or so (token in
('token1','token2','token3'...)). Since MySQL caches/hashes the query
_exactly_ as written, it's unlikely you'll ever get two same SQL clauses.

> I don't know what the point of the atime data is, but is there any need
> to update the atime on every read? Could that write be skipped if the
> atime is already within a certain period of time? Ie, if the atime has
> already been updated in the last 5 minutes, is there any point in doing
> it again?

That's a question worth entering into bugzilla. I doubt it even makes
difference it the time frame would be 1 day. After all the only point for
atime is to expire very old unused tokens. Would be fun to benchmark if I
had time.