Spam sorting (was Re: Install assistance needed in Berkeley area)
Tony Godshall
togo@of.net
Sun, 11 Jan 2004 14:33:40 -0800
According to Claude Rubinson,
> On Sun, Jan 11, 2004 at 12:22:10PM -0800, Tony Godshall wrote:
> > Say, any quick way to count the number of messages in a mbox file?
> > Is "grep -c '^From' file" sufficient?
>
> What's "sufficient" depends upon your demands. At a minimum you want
> to be searching for "\nFrom ". But it's not really that simple. See
> http://www.jwz.org/doc/content-length.html.
Yeah, well, all I want to do is to verify I'm training ham
vs spam without too much imbalance- with your comment in
mind, I can add a space and otherwise get a 'good enough for
my purposes' count with
grep -c '^From '
Oh, wait. Just found a better one.
$ mboxgrep -c . misc-200312
890
$ grep -c '^From ' misc-200312
890
$ mboxgrep -c '^From ' misc-200312
890
It's in package mboxgrep...
: Package: mboxgrep
: Priority: optional
: Section: mail
: Installed-Size: 92
: Maintainer: Tollef Fog Heen <tfheen@debian.org>
: Architecture: i386
: Version: 0.7.8-1
: Depends: libc6 (>= 2.3.1-1), libpcre3, zlib1g (>= 1:1.1.4)
: Filename: pool/main/m/mboxgrep/mboxgrep_0.7.8-1_i386.deb
: Size: 23652
: MD5sum: 6d5526b6803fb9760bdac3a9e36cd327
: Description: Grep through mailboxes
: mboxgrep is a small utility that scans either standard Unix
: mailboxes, Gnus nnml or nnmh mailboxes, MH mailboxes or
: Maildirs,
: and displays messages matching a basic, extended, or
: Perl-compatible regular expression.
Thanks, Claude, for the pointer.
Tony