amavis-user September 2010 archive
Main Archive Page > Month Archives  > amavis-user archives
amavis-user: Re: [AMaViS-user] Recipient-domain-specific SA baye

Re: [AMaViS-user] Recipient-domain-specific SA bayes db [was 'no subject']

From: Yassen Damyanov <yassen_tis_at_nospam>
Date: Thu Sep 16 2010 - 05:33:04 GMT
To: amavis-user@lists.sourceforge.net

Mark,

First of all, thank you so much for the amazingly thoughtful and in-depth reply!

--- On Wed, 9/15/10, Mark Martinec wrote:

> Yassen,
>
>> I want amavisd-new/spamassassin use a different Spamassassin Bayes
>> database for each separate domain hosted on my mail server. That
>> is, if the first "To:" recipient is userX@firstdomain.com, then
>> I want Bayes tests (and learning) to be done against SA Bayes databse
>> #1; if the first "To:" recipient is userZ@seconddomain.com, then
>> Bayes tests (and learning) should be done against SA Bayes databse
>> #2, and so on. If there is no "To:" recipient, Bayes tests (and
>> learning) should be done against a default Bayes database.
>
> This is a wrong approach for anything but a toy or a SOHO setup.

After your explanation, I see that clearly, no question...
   

>> [...] Then I can use policy banks to tune amavisd-new the way I want
>> it tuned for that specific domain,
>
> Policy banks apply to an entire message. They are an inappropriate
> mechanism for controlling per-recipient behaviour. Policy banks are
> typically associated with a sender or their IP address or authenticity,
> and not associated with recipients (one policy bank, multiple recipients).

Let me give a short background of my problem: I email-host half a dozen of domains and amavisd-new does a great job filtering the mail using clamav, SA, pyzor, razor and bayes (via SA). Bayes is a VERY helpful addition to the other tests and greatly improves the spam filtering success.

What I noticed was that within a domain bayes works great, probably because
legitimate mail within a domain tend to have a lot in common (also, spam
tend to have things in common). The very contrary is true if I compare different domains with each other -- users of different domains use different languages, not to speak about other differences (I have English-speaking domains, German, Bulgarian.) This is the reason that I seek a solution to separate bayes database to somehow work "per domain" and not be a global one for the whole install. I guess the perfect solution would be to maintain a separate bayes db for each user, but the very good results for installations with a single db for a whole domain makes me believe that this is a good approach that will be a lot simpler and yet retain good quality.

(Suggestions for different approaches are welcome.)

>> but I still don't know how to get it tell SA to look for it's bayes db
>> at a domain-specific location. Anyone's help is highly appreciated.
>>
>> My current plan is to introduce $sa_bayes_path in amavisd-new config
>> file(s), have amavisd-new patched to honor that argument when calling SA,
>> and also have it listen on a separate port for each domain. I will then
>> use policy banks to tune that same $sa_bayes_path argument differently for
>> each of the different ports (=domains).

This didn't work for me; I guess because amavid-new passes parameters to SA only when instantiating it, that is, at startup time.

So what I did was essentially what Vernon advised: (thanks, Vernon!)

--- On Sat, 9/11/10, Vernon A. Fort wrote:

> how about running each amavis with a different user account
> with each having a different home directory. each home
> directory would have a seperate .spamassassin/bayes*

only I do not employ different unix users, I rather use the amavisd-new config files, basically having several configs that differ only in $MYHOME and $inet_socket_port. My postfix setup uses
   smtpd_recipient_restrictions = ..., check_recipient_access

Didn't test this thoroughly, so not yet in production. Comments are welcome.

 
> The 2.7.0-pre7 has a new infrastructure in place which makes it possible
> to call SpamAssassin more than once per message, and even to load
> different SpamAssassin config files based on a recipient address (or domain),
> or based on a policy bank. It provides all the necessary internal support
> for per-recipient SpamAssassin processing. If you are doing any work
> in this area, the 2.7.0 is the codebase on which to ground any
> development work.

Sounds great (thank you for an amazingly useful piece of software!); I will download and look at it as soon as I can. (It sound like it won't need any hacking to do the thing I need.)

 
> As it happens, the switching of SpamAssassin configurations between
> messages (or even within a processing of a single mail message with
> multiple recipients) is a rather costly operation. For the purpose
> of switching a username used for Bayes SQL lookups it suffices to
> tell SpamAssassin to switch a username without loading his preferences
> config file. Such username switching is a fairly inexpensive operation.

So I should consider using an SQL-based bayes database, correct?

 
> What remains to be done is to map a recipient address to a (virtual)
> username, then to group recipients (of a multirecipient mail) into
> sets or recipients with a common username (such as his domain name),

the domain name is what groups them according to my theory, yes.

> then call SpamAssassin once for each username, and distribute
> resulting scores back to each recipient as appropriate.

Sounds exactly what I am trying to achieve!

 
> This is a fairly straightforward change from the current 2.7.0-pre7,
> based on all the already laid-down supporting mechanisms, and I guess
> I can make it into 2.7.0-pre8 without too much trouble, if someone
> is interested.

I am one (obviously); anyone else voting here?

Thanks again for your all your effort, Mark!
Cheers, Yassen

      

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot org