August 12, 2003

A Worrisome Trend...

In the past week, three messages that I did want to read got filtered into the "spam" mailbox:

  • Lalith Vipulananthan - Friday, 10:04 AM +0100 - Re: hot stuff
  • Alison Oaxaca - 8/6/03, 12:09 PM -0400 - NSF Deadline
  • Jim DeLong - 8/4/03, 9:23 AM -0400 - Tabloid Headline of the Week

You may say that anybody who puts "hot stuff" or "tabloid" in the subject line of an email message deserves what he gets (even if he is my father)--or that it is Eudora's fault for not having a "whitelist" of senders--and that spam-control technology will stay ahead of spamsters' ingenuity. But I find this worrisome. I don't need to add "check my spam mailbox for non-spam messages" to my list of daily tasks...

Posted by DeLong at August 12, 2003 09:23 PM | TrackBack

Comments

Apple's MacOSX Mail app is not perfect but I've been pleasantly surprised at its Bayesian spam filtering after I spent a month or so training it.
I've never had a valid message misclassified as spam, and I only get about a message or two a day of spam that is misclassified as valid.

Posted by: Maynard Handley on August 13, 2003 02:21 AM

The only solution to your problem though, if you are not even willing to grant white lists, is human level intelligence filtering.

Posted by: theCoach on August 13, 2003 04:05 AM

But I've watched enough movies to know that creating a human-level AI and then giving it a boring job *never* works...

Posted by: Brad DeLong on August 13, 2003 06:29 AM

I use a Bayesian filter myself - SpamBayes - and after feeding it ~100 examples of "good" and "bad" email each, I NEVER get my email misclassified. It has a provision for a "doubtfuls" category, but the fact is that I haven't had any such messages so far. Who needs AI when naive Bayesian classification works so well?

Posted by: Abiola Lapite on August 13, 2003 06:40 AM

Recent versions of Mozilla / Netscape mail also have a version of "Bayesian" spam filtering that works quite well after you have trained it. I don't quite believe that these are true "Bayesian" algorithms, but it has learned, for example, to distinguish which of the various Economic Research Network annoucements I actually want to see. Also, there is an option to never filter out mail from an address in your address book -- that alone would have solved some of your problems. (I presume that at least two of those correspondants are in your address book.)

Posted by: Steve B on August 13, 2003 07:34 AM

Eudora 5.2.1 has filter terms that match against address book entries. I spent about an hour getting a whitelist set up; my in-box is a lot quieter now :-)

Posted by: landond dyer on August 13, 2003 08:17 AM

Re: hot stuff, suggests you were lucky to even get a response.

Posted by: markus on August 13, 2003 03:49 PM
Post a comment