amavis-user September 2010 archive
Main Archive Page > Month Archives  > amavis-user archives
amavis-user: Re: [AMaViS-user] Recipient-domain-specific SA baye

Re: [AMaViS-user] Recipient-domain-specific SA bayes db [was 'no subject']

From: Yassen Damyanov <yassen_tis_at_nospam>
Date: Thu Sep 16 2010 - 05:33:04 GMT


First of all, thank you so much for the amazingly thoughtful and in-depth reply!

--- On Wed, 9/15/10, Mark Martinec wrote:

> Yassen,
>> I want amavisd-new/spamassassin use a different Spamassassin Bayes
>> database for each separate domain hosted on my mail server. That
>> is, if the first "To:" recipient is, then
>> I want Bayes tests (and learning) to be done against SA Bayes databse
>> #1; if the first "To:" recipient is, then
>> Bayes tests (and learning) should be done against SA Bayes databse
>> #2, and so on. If there is no "To:" recipient, Bayes tests (and
>> learning) should be done against a default Bayes database.
> This is a wrong approach for anything but a toy or a SOHO setup.

After your explanation, I see that clearly, no question...

>> [...] Then I can use policy banks to tune amavisd-new the way I want
>> it tuned for that specific domain,
> Policy banks apply to an entire message. They are an inappropriate
> mechanism for controlling per-recipient behaviour. Policy banks are
> typically associated with a sender or their IP address or authenticity,
> and not associated with recipients (one policy bank, multiple recipients).

Let me give a short background of my problem: I email-host half a dozen of domains and amavisd-new does a great job filtering the mail using clamav, SA, pyzor, razor and bayes (via SA). Bayes is a VERY helpful addition to the other tests and greatly improves the spam filtering success.

What I noticed was that within a domain bayes works great, probably because
legitimate mail within a domain tend to have a lot in common (also, spam
tend to have things in common). The very contrary is true if I compare different domains with each other -- users of different domains use different languages, not to speak about other differences (I have English-speaking domains, German, Bulgarian.) This is the reason that I seek a solution to separate bayes database to somehow work "per domain" and not be a global one for the whole install. I guess the perfect solution would be to maintain a separate bayes db for each user, but the very good results for installations with a single db for a whole domain makes me believe that this is a good approach that will be a lot simpler and yet retain good quality.

(Suggestions for different approaches are welcome.)

>> but I still don't know how to get it tell SA to look for it's bayes db
>> at a domain-specific location. Anyone's help is highly appreciated.
>> My current plan is to introduce $sa_bayes_path in amavisd-new config
>> file(s), have amavisd-new patched to honor that argument when calling SA,
>> and also have it listen on a separate port for each domain. I will then
>> use policy banks to tune that same $sa_bayes_path argument differently for
>> each of the different ports (=domains).

This didn't work for me; I guess because amavid-new passes parameters to SA only when instantiating it, that is, at startup time.

So what I did was essentially what Vernon advised: (thanks, Vernon!)

--- On Sat, 9/11/10, Vernon A. Fort wrote:

> how about running each amavis with a different user account
> with each having a different home directory. each home
> directory would have a seperate .spamassassin/bayes*

only I do not employ different unix users, I rather use the amavisd-new config files, basically having several configs that differ only in $MYHOME and $inet_socket_port. My postfix setup uses
   smtpd_recipient_restrictions = ..., check_recipient_access

Didn't test this thoroughly, so not yet in production. Comments are welcome.

> The 2.7.0-pre7 has a new infrastructure in place which makes it possible
> to call SpamAssassin more than once per message, and even to load
> different SpamAssassin config files based on a recipient address (or domain),
> or based on a policy bank. It provides all the necessary internal support
> for per-recipient SpamAssassin processing. If you are doing any work
> in this area, the 2.7.0 is the codebase on which to ground any
> development work.

Sounds great (thank you for an amazingly useful piece of software!); I will download and look at it as soon as I can. (It sound like it won't need any hacking to do the thing I need.)

> As it happens, the switching of SpamAssassin configurations between
> messages (or even within a processing of a single mail message with
> multiple recipients) is a rather costly operation. For the purpose
> of switching a username used for Bayes SQL lookups it suffices to
> tell SpamAssassin to switch a username without loading his preferences
> config file. Such username switching is a fairly inexpensive operation.

So I should consider using an SQL-based bayes database, correct?

> What remains to be done is to map a recipient address to a (virtual)
> username, then to group recipients (of a multirecipient mail) into
> sets or recipients with a common username (such as his domain name),

the domain name is what groups them according to my theory, yes.

> then call SpamAssassin once for each username, and distribute
> resulting scores back to each recipient as appropriate.

Sounds exactly what I am trying to achieve!

> This is a fairly straightforward change from the current 2.7.0-pre7,
> based on all the already laid-down supporting mechanisms, and I guess
> I can make it into 2.7.0-pre8 without too much trouble, if someone
> is interested.

I am one (obviously); anyone else voting here?

Thanks again for your all your effort, Mark!
Cheers, Yassen


Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
AMaViS-user mailing list
 Please visit regularly
 For administrativa requests please send email to rainer at openantivirus dot org