amavis-user June 2010 archive
Main Archive Page > Month Archives  > amavis-user archives
amavis-user: Re: [AMaViS-user] about amavis spamassassin spam sc

Re: [AMaViS-user] about amavis spamassassin spam scores

From: Mark Martinec <Mark.Martinec+amavis_at_nospam>
Date: Wed Jun 02 2010 - 14:03:36 GMT


> What I am asking is how had amavis reached on the conclusion that for a
> spam score of '6.9', amavis should start spam evasive action. What is
> the reasoning behind the spam score assumptions.

Andy Dills writes:
> Bottom line? The values they provide for default are very conservative,
> and there's no "right" number. Like Michael said, it's very much
> environment-dependant. Start with the defaults, look at how your mail gets
> scored, and adjust from there.

Exactly. These values are just a conservative starting point (which happens
to be close to what we are using at our site).

One methodology for determining the 'best' thresholds takes into account
a cost of false positives (ham classified as spam) and a cost of
false negatives (spam classified as ham).

If you just tag and deliver spam to a user's mailbox, then a cost of
a false positive is small, user received has mail delivered anyway and
can read it immediately if he decides it may be interesting, despite
it being labeled as spam. In such case the cost of a FP is small, and
one can use more aggressive (lower) thresholds for tag2 (kill) level.

If on the other hand, you do not deliver spam to a user's mailbox, but
instead just quarantine it (or reject it), then a recipient needs more
effort to check quarantine and retrieve his FP message from there (or may
require re-send from a sender in case of rejecting). This may even require
involving administrator's (or help desk) assistance. In this case the
cost of a FP is high, and one should use conservative (high) values for
tag2 and kill levels.

Similar goes for false negatives (spam delivered as clean, untagged).
Here the cost is a user glancing at the message and hitting a delete.
This cost is typically much lower than a cost of a FP, but still,
too many spams delivered can add up to the cost.

The SpamAssassin default configuration is geared towards a ham/spam
threshold at 5 points. A GA algorithm is used to adjust scores of
individual rules based on a corpus of known spam and known ham.
Here too the cost of a wrong decision is taken into account.
It is worth mentioning that a cost of a FP was considered 5 times
the cost of a FN with SA runs for SA version 3.2.*, but for a version
3.3 this ratio was 20. This means that SA 3.3 is more conservative
in proclaiming something to be spam, than SA 3.2 was. Consequently,
amavisd thresholds can be reduced somewhat when switching from
SA 3.2 to 3.3 for similar results.

Also, the better your rules are tuned and tweaked in response to
current trends in spam and ham, the better you train the Bayes,
and the more of high quality rules you add (good DNS RBLs,
reputation scores, ...), the tighter your thresholds can be.



AMaViS-user mailing list
 Please visit regularly
 For administrativa requests please send email to rainer at openantivirus dot org