spamassassin-dev December 2011 archive
Main Archive Page > Month Archives  > spamassassin-dev archives
spamassassin-dev: [Bug 6400] GA feedback for Mailspike DNSBL

[Bug 6400] GA feedback for Mailspike DNSBL

From: <bugzilla-daemon_at_nospam>
Date: Tue Dec 06 2011 - 09:09:33 GMT
To: dev@spamassassin.apache.org

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6400

--- Comment #27 from Warren Togami <wtogami@gmail.com> 2011-12-06 09:09:33 UTC ---
> +1 for adding Mailspike to 3.4
>
> -1 on adding any new DNSBL via sa-update to existing releases, but strictly
> limit this on actual releases with README and release notes

+1 to adding Mailspike to 3.4. Just we need to be VERY CAREFUL about exactly
which rules to add, and how scores are set.

1) I strongly warn against letting the individual L# rules float with the
rescorer. We will NOT see a logical linear progression of higher scores for
_L3 _L4 and _L5. I would recommend letting only the aggregate _BL float with
GA rescoring. Include _L3, _L4, _L5 and ZBI only as informational rules to aid
in future statistical analysis.

2) BE VERY CAREFUL DURING GA RESCORING!
"reuse" is the best way to ensure we are doing a proper apples-to-apples
comparison of MSPIKE and the other DNSBL's as their combination mutually
rebalances their scores. However, if we enable "reuse" for MSPIKE, we must be
*CERTAIN* that all masscheck participants have their spam tagged using MSPIKE
rules. If not, they will artificially count as non-hits and throw off the
statistics, potentially fatally.

3) Please add _H whitelist rules only as informational.

http://www.mail-archive.com/users@spamassassin.apache.org/msg69546.html
As noted in this earlier analysis, our existing whitelists DO NOT IMPROVE the
results of Spamassassin. Weekly masscheck results have consistently indicated
moderately poor performance of the existing whitelists, so whitelists may even
be making things slightly worse.

This belongs in an separate discussion but mentioning this as it is related.

 * We would be better off again reducing the existing whitelist scores. In
particular, DNSWL_LOW and IADB whitelist rules are consistently demonstrating
problems, probably due to poor enforcement.

 * We should artificially set all Whitelist rules to -0.01 during any future GA
rescoring. Why? We are testing the efficacy of the spam detection rules, and
the two are mutually independent. Zeroing out the effect of whitelists during
score generation ensures that whitelists are not improperly affecting the score
setting of spam detection rules.

-- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.