« New Wi-Fi Blackberry Announced (Hope It's More Polite Than the iPhone) | Main | Considering Googledorks »
July 19, 2007
Study: Like It or Not, Challenge/Response Anti-Spam Works Best
Consulting firm Brockmann & Company released a study on Tuesday that compared the efficacy of assorted anti-spam technologies, as well as the overall user satisfaction for each. The study uses a number referred to as the "spam index" to make comparisons.
In short, the spam index is a measure of how much time you waste going through your spam folders, how many false positives you find when you do that, and how often you have to ask someone to resend a message because, perhaps, your spam filter ate it. For now we'll set aside the notion that, with that last item in the list, the index might more properly be named "The Spam and Weasel Spam-Filter-Blaming Coworker Index."
Brockmann provides a calculator if you're curious about your own spam index.
By comparing spam indexes for each type of solution, the study concluded:
- Your ISP's spam filter is as bad as you think it is.
- Your SpamAssassin installation isn't much better, and is generally a little worse than everything else.
- Challenge-response (c-r) systems suck about half as much as anything else on the market.
We all know challenge-response systems work pretty well. That's never been in question. The problem is that they can cause problems for people who are not us.
I once interviewed a company in the c-r business along with a fairly vociferous detractor of the approach, and found the debate hadn't shifted much since it first started: c-r advocates say they've got the key to eradicating spam in our lifetime, at the cost of occasional one-time inconvenience. C-r haters say the price for the solution is the inconvenience c-r advocates admit to, plus a raft of secondary bad effects including "blizzards" of bogus c-r responses for anyone who falls victim to a joe job.
The study also concluded that c-r users are much happier with their solution than anyone else: Sixty-seven percent reported they were satisfied with a c-r system. People who used "open source filters" reported the very least satisfaction: 16 percent of the respondents.
I don't think I'd ever go with c-r for my own mail setup, but some days I think Jupitermedia (my employer, this blog's publisher) could go c-r and I'd breathe a sigh of relief. The sort of legitimate mail that comes in from the PR industry seems to confuse bayesian learners enough that there's a lot of overlap between marginal spam and mass-mailed press releases. If our mail admins wanted to shoulder the karmic burden the c-r haters say would be their due, I'd let them and enjoy a much less spammed existence.
As it is now, I've got two layers in my anti-spam setup:
A SpamAssassin installation with a decent whitelist does bayesian filtering over everything when it comes through. On my computer, I have SpamSieve to sweep up what got past SpamAssassin. I used to rely on Apple Mail's bayes learner, but it collapsed under the weight of its own corpus.
SpamAssassin can be trusted to find the really spammy stuff ... things that score over 8 points on its scale. At about four points, it's seldom producing false positives in the midst of a lot of spam. The mail it scores below a four is all over the place ... could be spam, could be ham and I don't want to sit down and take an average to find the exact sweet spot. So that's where SpamSieve handles the rest. It's easier to fine-tune and train on the client end than popping open an ssh session and fiddling with SpamAssassin, or bothering with server-side scripting to process false positive/false negative folders.
Posted by mhall at 7:44 PM | Add Comment


Leave a comment