Spam sorting (was Re: Install assistance needed in Berkeley area)

Nick Moffitt nick@zork.net
Fri, 2 Jan 2004 17:29:13 -0800


begin  Tim Freeman  quotation:
> From: Nick Moffitt <nick@zork.net>
> >	You ought to consider using something else, like a bayesian
> >filter.  I get 25 megabytes of spam a day, and I only ever have to
> >read one or two a month.
> 
> I use bogofilter and I would not get results that good with just
> bogofilter.  I'd like to figure out what we're doing different.  Here
> are some alternatives I can see:
> 
> 1. Is there a bayesian filter noticeably better than bogofilter?  I
>    ran some head-to-head tests a while ago before selecting
>    bogofilter.  I hear spambayes (sp?) is good, but I don't see a
>    Debian package for it.

	Spambayes will kick bogofilter's ass, especially since it's
not written by a self-obsessed lunatic who changes the meaning of
command-line switches to their exact opposite in between MINOR
RELEASES.  Bogofilter works great for me, though.

> 2. Do you let it automatically train on the spams and non-spams as it
>    sorts them, or do you only do manual training?  I only do manual
>    training because I don't want bogofilter to develop
>    self-reinforcing bad habits of dumping my good emails in the spam
>    pile.

	This is the big one.  Yes, I do self-reinforcement!  I also go
through and correct errors.  That is the only way to fly!  Without
self-reinforcement, you cannot take full advantage of the statistical
analyses!

	The big thing I did that cut down greatly on my spam reading
was welcoming in more spam.  I opened up all sorts of old aliases that
I had cut down as too waterlogged to be any good, and started using my
main address in more publicly visible places.  This is the key to
bayesian filtering:  The more spam you *get*, the less you have to
*read*!

Of course there are also some UI tricks you can do to make
classification happen largely automatically.  I use mutt, and I have
the following macros set:

----8<----
macro index s "<enter-command>unset wait_key\n<tag-prefix><pipe-entry>bogofilter -MSn\n<enter-command>set wait_key\n<tag-prefix><save-entry>"
macro pager s "<enter-command>unset wait_key\n<pipe-entry>bogofilter -MSn\n<enter-command>set wait_key\n<save-entry>"
----8<----

	This one makes it so that all I need to do to correct a false
positive is save the mail out of my .trash folder and put it in the
appropriate place.  This solves the reinforcement of bad behavior
problem neatly.  Also, since there's no reason I'd ever SAVE spam, the
semantics work for normal use.

----8<----
macro index r "<enter-command>unset wait_key\n<tag-prefix><pipe-entry>bogofilter -Mn\n<enter-command>set wait_key\n<tag-prefix><reply>"
macro pager r "<enter-command>unset wait_key\n<pipe-entry>bogofilter -Mn\n<enter-command>set wait_key\n<reply>"

macro index g "<enter-command>unset wait_key\n<tag-prefix><pipe-entry>bogofilter -Mn\n<enter-command>set wait_key\n<tag-prefix><group-reply>"
macro pager g "<enter-command>unset wait_key\n<pipe-entry>bogofilter -Mn\n<enter-command>set wait_key\n<group-reply>"

macro index l "<enter-command>unset wait_key\n<tag-prefix><pipe-entry>bogofilter -Mn\n<enter-command>set wait_key\n<tag-prefix><list-reply>"
macro pager l "<enter-command>unset wait_key\n<pipe-entry>bogofilter -Mn\n<enter-command>set wait_key\n<list-reply>"
----8<----

    These are strictly semantic juice.  There's no reason I'd ever
*reply* to spam either, so I run the mails I'm replying to through the
double-un-spam filter.

----8<----
macro index X "<enter-command>unset wait_key\n<tag-prefix><pipe-entry>bogofilter -MNs\n<enter-command>set wait_key\n<tag-prefix><delete-message>"
macro pager X "<enter-command>unset wait_key\n<pipe-entry>bogofilter -MNs\n<enter-command>set wait_key\n<delete-message>"
----8<----

	Since no such guarantees can be made about the mails I delete
(I delete non-spams as well as spam), I have a separate key for this.

	Note that mutt lacks a "propagate earlier tag-prefix state"
function, so I just let mutt beep when I'm not doing a tagged command
in the index.

-- 
"Forget the damned motor car and build cities for lovers and friends."
	-- Lewis Mumford

end