Spam sorting (was Re: Install assistance needed in Berkeley area)

Tony Godshall togo@of.net
Sun, 11 Jan 2004 14:33:40 -0800


According to Claude Rubinson,
> On Sun, Jan 11, 2004 at 12:22:10PM -0800, Tony Godshall wrote:
> > Say, any quick way to count the number of messages in a mbox file?
> > Is "grep -c '^From' file" sufficient?
> 
> What's "sufficient" depends upon your demands.  At a minimum you want
> to be searching for "\nFrom ".  But it's not really that simple.  See
> http://www.jwz.org/doc/content-length.html.

Yeah, well, all I want to do is to verify I'm training ham
vs spam without too much imbalance- with your comment in
mind, I can add a space and otherwise get a 'good enough for
my purposes' count with 
  grep -c '^From '

Oh, wait.  Just found a better one.
$ mboxgrep -c . misc-200312
890
$ grep -c '^From ' misc-200312
890
$ mboxgrep -c '^From ' misc-200312
890

It's in package mboxgrep...

: Package: mboxgrep
: Priority: optional
: Section: mail
: Installed-Size: 92
: Maintainer: Tollef Fog Heen <tfheen@debian.org>
: Architecture: i386
: Version: 0.7.8-1
: Depends: libc6 (>= 2.3.1-1), libpcre3, zlib1g (>= 1:1.1.4)
: Filename: pool/main/m/mboxgrep/mboxgrep_0.7.8-1_i386.deb
: Size: 23652
: MD5sum: 6d5526b6803fb9760bdac3a9e36cd327
: Description: Grep through mailboxes
:  mboxgrep is a small utility that scans either standard Unix
:  mailboxes, Gnus nnml or nnmh mailboxes, MH mailboxes or
: Maildirs,
:  and displays messages matching a basic, extended, or
:  Perl-compatible regular expression.

Thanks, Claude, for the pointer.

Tony