spamassassin-users December 2011 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: sa-learn and modern spam sizes

Re: sa-learn and modern spam sizes

From: Duane Hill <duihi77_at_nospam>
Date: Tue Dec 20 2011 - 15:38:58 GMT

On Tuesday, December 20, 2011 at 15:26:06 UTC, confabulated:

> On 12/19/2011 4:03 AM, Jonas wrote:
>>>> I've never seen spam larger than 3 MB.
>>> which is much bigger than the 256 kB limit in sa-learn that the OP is having a
>>> problem with.
>> Indeed, of course I agree the avg. spam size is much much lower.
>> But a lot of the "manual" spam, typically originating in asia where people send out spam through Hotmail/gmail can be 1-3MB in size. Most of these are electronics or textile oriented "business" offers.
>> And my problem remains, our setup is based on MailScanner (a daemon like amavis-new) which doesn't use spamc/spamd so I'm unable to train my bayes on these 1MB+ size spams, which is a problem.
>> So can I conclude that there's no real solution to this besides code change?
>> Should I open a bug about it?

> If you're using mailscanner, why not ask on the mailscanner list?


I don't believe this is a mailscanner issue. I also have found
sa-learn not learning anything over a certain size. It's been a while
(a month) since I've seen large spam so I never bothered.

mailhost# ls -l notspam.msg
-rw-r--r-- 1 duane duane 363516 Dec 20 15:35 notspam.msg

mailhost# sa-learn --ham notspam.msg
Learned tokens from 0 message(s) (0 message(s) examined)

If I chop everything out except the message headers:

mailhost# sa-learn --ham notspam.msg
Learned tokens from 1 message(s) (1 message(s) examined)

-- If at first you don't succeed... much for skydiving.