Spam Getting More Personal?

The Viagra and Cialis knock-offs being pushed in so much of the spam I get may be directed at things the recipients feel very personally about, but the message itself has never been personal. Well, it had never seemed personal to me, anyway, until now.

Clay Shirky pointed out what I’ve started to see, and wonder about, myself: many of the subject lines in the spam I’ve received recently sound familiar, and plausible as a real message. So plausible, in fact, that my spam filters can’t trap it — even I can’t spot it — until after I’ve read it. Some of the subject lines that surprised him as spam?

  • Subject: definition of what “free software” means. Outgrowing its
  • Subject: What makes it particularly interesting to private users is that there has been much activity to bring free UNIXoid operating systems to the PC,
  • Subject: and so have been long-haul links using public telephone lines. A rapidly growing conglomerate of world-wide networks has, however, made joining the globa

Here’s what he had to say about it:

Can it be that spammers are starting to associate context with individual email addresses, in an effort to evade Bayesian filters? (If you wanted to make sure a message got to my inbox, references to free software, open source, and telecom networks would be a pretty good way to do it. I mean, what are the chances?) Some of this stuff is so close to my interests that I thought I’d written some of the subject lines and was receiving this as a reply. Or is this just general Bayes-busting that happens to overlap with my interests?

If it’s the former, then Teilhard de Chardin is laughing it up in some odd corner of the noosphere, as our public expressions are being reflected back to us as a come-on. History repeats itself, first as self-expression, then as ad copy…

And all of this has me thinking about two things.

First, despite the SEC crackdown on stock spam (and even a lawsuit in 2006), market-related spam probably isn’t going away, in part because the nature of the market makes it easy for spammers to profit.

“All products for your health” spams require that somewhere, somehow the spammer collect money from the stooge, but stock spams wash that money through the market. Sure, statistical evaluation might reveal the spammers, but how many?

A possible explanation Shirky didn’t discuss is how spammers might be applying tracking and statistical tools to profile us. Who knows how many messages get trapped, but if the spammers track those few that get through — including the claimed sender, subject, and other text — then what’s to stop them from exploiting that hole in our filters? How many such messages would have to sneak through before spammers got a pretty good profile of us and our filters?

Update, here’s one that arrived just now:

  • What’s Google up to 1st of April This Year?