spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: SA Sorbs Usage/Rules

Re: SA Sorbs Usage/Rules

From: <darxus_at_nospam>
Date: Fri Dec 16 2011 - 18:57:42 GMT
To: users@spamassassin.apache.org

On 12/16, Lutz Petersen wrote:
>
> I know some of the discussions in the past about usage of Sorbs RBLs
> in Spamassassin. The scores today are as follows:
>
> score RCVD_IN_SORBS_BLOCK 0 # n=0 n=1 n=2 n=3
> score RCVD_IN_SORBS_DUL 0 0.001 0 0.001 # n=0 n=2
> score RCVD_IN_SORBS_HTTP 0 2.499 0 0.001 # n=0 n=2
> score RCVD_IN_SORBS_MISC 0 # n=0 n=1 n=2 n=3
> score RCVD_IN_SORBS_SMTP 0 # n=0 n=1 n=2 n=3
> score RCVD_IN_SORBS_SOCKS 0 2.443 0 1.927 # n=0 n=2
> score RCVD_IN_SORBS_WEB 0 0.614 0 0.770 # n=0 n=2
> score RCVD_IN_SORBS_ZOMBIE 0 # n=0 n=1 n=2 n=3
>
> The 0-Scores for DUL was done because lot of people thought there
> were too much false positives within that (I dont see so, but ok).
> Another Argument for 0-Scoring or not using sorbs was that the rbl
> contains a lot of old (meaning not actual) entries in the spam
> section (in mind of the dislist policy). Ok.
>
> But today I take a deeper look at the sorbs rbls and found, that
> there is a very simple misconfigration in the SA rules. The rbl
> check is done against the big 'dnsbl.sorbs.net' zone:
> eval:check_rbl('sorbs', 'dnsbl.sorbs.net.')
>
> And _that_ in my opinion is wrong. The rbl lookup should be done
> against the rbl 'safe.dnsbl.sorbs.net' instead. This rbl is a
> compilation of most of the sorbs partial lists as dnsbl.sorbs.net
> but with a simple difference: In opposite to dnsl.sorbs.net it
> does not contain the 'recent.spam' and the 'old.spam' partial
> lists, which are contained in 'dnsbl.sorbs.net'. The only spam
> listed in this 'safe.dnsbl.sorbs.net' contains spam of the last
> 24 hours, so the arguments against using sorbs especially because
> of its spam delisting policy do not exist. One could simply change
> the rbl lookup to the right zone and so also score spams within
> that rbl (low).
>
> Description of the different sorbs partial-zones as of the
> aggregate zones here: https://www.sorbs.net/using.shtml

After digging into this a bit, I believe your entire objection is to the
default rule set not handling the 127.0.0.6 return code, used by the
following lists?

      new.spam.dnsbl.sorbs.net 127.0.0.6
   recent.spam.dnsbl.sorbs.net 127.0.0.6
      old.spam.dnsbl.sorbs.net 127.0.0.6
          spam.dnsbl.sorbs.net 127.0.0.6
   escalations.dnsbl.sorbs.net 127.0.0.6

The rule for that return code is commented out in the default rule set with
this comment:

# delist: $50 fee for RCVD_IN_SORBS_SPAM, others have free retest on request

Which seems likely to have resulted from this bug:

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2221

Lists returning the 127.0.0.6 code in the safe.dnsbl.sorbs.net agregate
zone are:

new.spam.dnsbl.sorbs.net
recent.spam.dnsbl.sorbs.net
escalations.dnsbl.sorbs.net

new.spam is only hosts from the last 48 hours.
recent.spam is hosts from the last 28 days.
escalations doesn't seem to have a time limit.

So it seems your statement that "The only spam listed in this
'safe.dnsbl.sorbs.net' contains spam of the last 24 hours" is incorrect.

Basically, without evidence money is not charged to be delisted from any
of those three lists, they're going to stay out of the default rule set.

With the currently enabled default rules, there would be *no* difference
if you changed from dnsbl.sorbs.net to safe.dnsbl.sorbs.net because we're
not using the lists as an aggregate (we don't only have a RCVD_IN_SORBS
rule), but have separate rules for each of the return codes. And there
is no difference in what lists are providing which return codes between
those two aggregate lists other than the 127.0.0.6 (spam) value (which is
disabled).

Also, I wouldn't say the 0 scores were done "because lot of people thought
there were too much false positives". The scores are flagged
as mutable, meaning optimal scores are generated daily
using masscheck data. Related statistics can be seen here:
http://ruleqa.spamassassin.org/?daterev=20111210&rule=%2Fsorbs
RCVD_IN_SORBS_DUL seems to have a decent hit rate for both spam and ham, so
somehow the score generator just decided the most spams would be caught
without exceeding 1 false positive in 2500 hams with that score. It's not
always clear what exactly it's thinking. It could be, for example,
that almost all of the spam hits from RCVD_IN_SORBS_DUL overlapped with
another blacklist, and the SORBS_DUL list caused more false positives
than that other blacklist, so that other blacklist got a decent score,
and SORBS_DUL didn't. But these scores do not come from the whims
of humans.

-- "Anarchy is based on the observation that since few are fit to rule themselves, even fewer are fit to rule others." -Edward Abbey http://www.ChaosReigns.com