spamassassin-dev November 2011 archive
Main Archive Page > Month Archives  > spamassassin-dev archives
spamassassin-dev: [Bug 6386] Limit corpora network test age in s

[Bug 6386] Limit corpora network test age in score generation

From: <bugzilla-daemon_at_nospam>
Date: Tue Nov 08 2011 - 17:53:46 GMT
To: dev@spamassassin.apache.org

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6386

Darxus <Darxus@ChaosReigns.com> changed:

           What |Removed |Added
----------------------------------------------------------------------------
           Severity|major |critical

--- Comment #4 from Darxus <Darxus@ChaosReigns.com> 2011-11-08 17:53:46 UTC ---
Can I get some other opinions on what the ham age limit should be?

There's a nice graphical representation of the problem in this graph:
http://www.chaosreigns.com/dnswl/ham.svg

See that big hump on the right at the top, the light blue "At least None" line?
 Where it goes from ~50, up to 60-62 for a while, then back down to ~47? That
29% drop at the end was due to JM's corpora being added back, with his mostly 3
to 4 year old ham corpus which is comprising 30% of our ham used for
re-scoring.

That "At least None" line represents the percent of ham that hits any rank of
DNSWL.org. And it shows that using so much data that's so old is really
screwing up how accurately we measure the performance of things like white
lists.

20110806 50.6
20110813 50.3545 bb present
20110820 50.5765

20110910 62.304
20110917 62.406
20110924 61.4487
20111001 60.9607 bb missing
20111008 60.9483
20111015 60.5923
20111022 61.6126

20111029 47.4826 bb present
20111105 47.6509

I realize this problem is critically linked to fixing our ability to add new
masscheck accounts, but I'd like to try to get consensus on what the ham age
limit should be changed to.

-- Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug.