|Main Archive Page > Month Archives > spamassassin-users archives|
On Wed, 2011-12-14 at 23:36 -0500, firstname.lastname@example.org wrote:
> On 12/15, Martin Gregorie wrote:
> > Could somebody with access to the SA Bugzilla kindly add a comment to
> > bug 4078 saying that this is also an issue with Cyrillic encoded in
> > UTF-8? I'm asking because at present #4078 only mentions Windows code
> > pages and koi8. There is nothing to indicate that this is also a problem
> > with UTF-8.
> Although as Karsten pointed out, bug 4078 isn't actually
> related, since that bug is actually related to character sets primarily in
> another language. Which UTF8 is not. Bug 6364 is probably exactly the
> same as your issue, just in a different language - needing TextCat fixed /
The actual problem is that bug 4078 is over-restrictive in its
applicability: it merely says that CHARSET_FARAWAY_HEADER isn't returned
if a message body is in Hebrew.
The problem that needs addressing is that the ok_locales configuration
parameter doesn't work. This appears to be because it thinks the
sender's choice of (in Windows terms) the character translation code
page is a reliable indication of the sender's locale. I accept that this
used to work, but since the widespread introduction of UTF-8 and other
Unicode encodings, any such assumption is deeply flawed.
The same comments are also applicable to textcat (bug 6364)
There are really only two possibilities for resolving these bugs:
1) Fix bug 6364 by rewriting the code textcat uses to recognise the
predominant language used in body text. Fix bug 4078 by rationalising
ok_locales to use the revised textcat code to determine the locale
used by the sender before comparing this with the list of acceptable
2) Declare textcat and ok_locales to be irretrievably broken and
remove them from future versions of SA.
That said, I'm happy to become a bugzilla user, but before I add
anything to it, I'd like to know if you'd prefer me to add comments to
4078 and/or 6364 or if it would be best raise a new bug containing my
suggestion #1. I've kept an example message that I can provide as