spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Adding a blacklist via sa-update - would you

Adding a blacklist via sa-update - would you mind?

From: <darxus_at_nospam>
Date: Thu Dec 01 2011 - 17:58:13 GMT
To: users@spamassassin.apache.org

There is some question among spamassassin developers* on whether or not
it is acceptable to increase the network load of spamassassin, by one
DNS query per email, for existing releases (version 3.3.x), by adding
one DNS blacklist to the rule set via sa-update.

This Mailspike blacklist has proved useful and reliable in (ruleqa) testing
for the last two years.

Responses that would be particularly useful:

1) I run spamassassin on a very high volume of email, and the addition
   of 1 DNS query per email would cause me more problems than the
   increased accuracy would be worth. And I don't already have all
   network or DNS tests disabled.

2) I run spamassassin on a very high volume of email, and the addition
   of 1 DNS query per email would not bother me.

If it does get added, it will be following a post to this list. If you
want to ensure you do not use this blacklist if it does get added, you can
preemptively disable it with:

score RCVD_IN_MSPIKE_ZBI 0
score RCVD_IN_MSPIKE_L5 0
score RCVD_IN_MSPIKE_L4 0
score RCVD_IN_MSPIKE_L3 0
score RCVD_IN_MSPIKE_L2 0
score RCVD_IN_MSPIKE_H2 0
score RCVD_IN_MSPIKE_H3 0
score RCVD_IN_MSPIKE_H4 0
score RCVD_IN_MSPIKE_H5 0
score RCVD_IN_MSPIKE_BL 0
score RCVD_IN_MSPIKE_WL 0

Latest RuleQA results:
http://ruleqa.spamassassin.org/?daterev=20111126&rule=%2Fmspike
The rule I'm most interested in is RCVD_IN_MSPIKE_BL, hitting 74.6% of
spam, and 0.0067% of ham (9 out of 135,096). Which places it as the
6th highest ranked rule, which is very impressive. Although these rules
are not using "reuse", which causes some bias.

  MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE
      0 82.8451 0.0044 1.000 1.00 0.00 RCVD_IN_XBL
      0 76.6509 0.0007 1.000 1.00 0.00 URIBL_AB_SURBL
      0 82.8057 0.0044 1.000 1.00 0.00 RAZOR2_CF_RANGE_E8_51_100
      0 82.8057 0.0096 1.000 0.99 0.00 RAZOR2_CF_RANGE_51_100
      0 73.0507 0.0037 1.000 0.99 0.00 DIGEST_MULTIPLE
      0 74.6530 0.0067 1.000 0.99 0.00 T_RCVD_IN_MSPIKE_BL

Mailspike has assured us that they will not cause false positives due to
detecting high hit rates, and that they have "a DNSBL mirror network,
which is on standby and ready to kick in, in case we start getting too
much traffic."

So worst case, for some unforeseen reason Mailspike's network
catastrophically fails, and you get no DNS responses, resulting in no
impact on your spamassassin scores. And we remove it via sa-update 1 day
later.

One of the problems with this question is that the answers we can get on
this mailing list really don't matter. If you're reading this email, you
can disable the rule before it's added, and it won't affect you. So I can
only hope that you can provide some idea of how people not reading this
list or the dev list would feel. Are there people processing so much email
that they could be negatively impacted by 1 additional DNS query per email
who are not actively reading either of these lists?

We would prefer to only add this blacklist to future releases, version
3.4.x+. But we currently do not have a way to maintain two separate sets
of rule updates. So our options are:

1) Never add new blacklists. Not actually an option.
2) Add new blacklists only to 3.4.x+ releases, only calculate optimal
   scores for 3.4.x+, and just exclude the new rules from 3.3.x releases
   resulting in suboptimal 3.3.x scores. This seems to be the prevailing
   option. Theoretically possible this could cause significant increase in
   missed spam, actual likelihood is unknown.
3) Add new blacklists uniformly to all 3.3.x+ releases.
4) Develop a way to maintain two separate rule sets. May, realistically,
   be impossible.
5) Add new blacklists to 3.4.x rules, and stop providing 3.3.x updates. We
   can't expect people to just cut over suddenly like that.

Bug discussing adding this DNSBL:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

If you're wondering "what about releases older than 3.3.x?", the answer is:
We stopped expecting releases that old to be useful a while ago. Please
upgrade.

* This might sound like implication that I'm a spamassassin developer. I'm
  not. I do not have commit access.

-- "We will be dead soon. Is this how we want to live?" http://www.ChaosReigns.com