Spam sorting (was Re: Install assistance needed in Berkeley area)

Tim Freeman tim@fungible.com
Fri, 2 Jan 2004 21:03:37 -0700


begin  Nick Moffitt  quotation:
> This is the key to bayesian filtering: The more spam you *get*, the
> less you have to *read*!

So how big is your training corpus by now?  I have about 1000 spams
and 1000 hams.

Do you ever expire old messages from the training corpus?  I don't.
I'm thinking of it, though, since my older training hams have a
different average age from my older training spams, and bogofilter
seems to learn from inessential things like changes in dates and
consequences of changes in how receiving email is set up.  However in
tests I was never able to get improved sorting by leaving out the old
skewed emails, probably because I have relatively little training
data.

Do you retain all messages that you've trained your spam filter on?  I
retain them just in case I want to change spam filters some day, and
so I can do experiments to figure out the best threshhold to use for
bogofilter, or the experiment I mentioned in the previous paragraph of
leaving out old training data to see if it improves things.

From: "Sean 'Shaleh' Perry" <shaleh@speakeasy.net>

>I use spambayes, it is a good filter.  However, it has some serious
>drawbacks: 1) it is single user.  This is the biggest reason there is
>no package for it I bet.

There are lots of single-user programs, like emacs, that get packages.
You know this so I must not have understood you properly.  Which
single-user programs do you think won't get packages?

All it takes is one enthusiastic and competent person to make a
package, so I think it's just bad luck that this person hasn't shown
up for spambayes.

-- 
Tim Freeman                                                  tim@fungible.com
I xeroxed a mirror. Now I have an extra xerox machine. -- Steven Wright