Spam sorting (was Re: Install assistance needed in Berkeley area)
Tony Godshall
togo@of.net
Thu, 8 Jan 2004 11:14:30 -0800
According to Nick Moffitt,
> begin Tony Godshall quotation:
> > Unfortunately bayesian filtering has become a bit less
> > effective lately (more spam gets through) as spammers are
> > using random misspellings and garbage words to evade. But
> > hopefully someone will come up with a fix for that (perhaps
> > grouping rarely-seen words together rather than ranking them
> > separately).
>
> Nope. You just don't get enough spam. Bayesian filtering
> eventually flags those nonsense words as spammy words, and you don't
> need to worry. Remember, and repeat after me: The more spam you GET,
> the less you have to READ!
>
> > I only use manual training. Supposedly it's best to use
> > approximatly equal volume of spam and non-spam in the
> > training. spamassassin's FAQ (I think) recommends a training
> > with 1000+ messages each (spam and non). Also, it's easy to
> > reverse a mistake, as using sa-learn --ham reverses the
> > effect of sa-learn --spam and vise versa.
>
> That's why your bayesian filter isn't adapting. You're not
> letting it learn.
>
> If it makes a mistake, correct it. But don't keep it in the
> dark all the time!
sure, more data is always good. but what about the imbalance
issue? are you saying it's not a problem? and why should I take
your opinion on this seriously if you don't offer a rational argument
against what the SA doc says?