Why my website logs were making me suspicious
My SFragments website (where I write about science fiction conventions I've been to and authors I've met) runs Geeklog content management system (though to call it a content management system may be too pompous; it's more like a group blogging system). Every article, or a blog entry if you will, has a "Mail story to a friend" button. By clicking this button, a user is taken to a page where he or she has to enter his own and recipient's email addresses and an optional "personalized" message, then click a button "Send message", which would email this article to the recipient.
Every time a user clicks on something in my website, the HTTP request sent by the user's browser is logged in my website's logs. So when a user goes to "Mail story to a friend" page, it will show up in the logs. And for a few days last week the logs have been showing me that there had been quite a few attempts to email some of my articles -- over 10 attempts per article, ~ 50 attempts a day. No way is my web site so popular that people would be emailing my articles by the dozen. Any time I see high numbers of anything in my logs, my first thought is, spammers. More so because I've already had a problem in the past with my website being flooded with comment spam, of which I wrote in this post. I disabled comments, but I think spammers are ingenious enough to exploit other ways of abuse.
Here is what I suspect spammers were doing
What I suspect they were doing is running a program that accesses the "Mail story to a friend" page, fills in the sender and recipient addresses, fills in the personalized message field with spam content, and clicks the "Send message" button. It all can be done by a script, with no human effort. That way they may be tricking my Geeklog system into sending out spam. Two advantages of this method are that (1) the spammers are not risking that their ISP will shut them down, because the mail is not originating at their ISP; and (2) since the spam message is attached to my article, it has a chance of fooling the spam filters, because the bulk of the text -- my article -- would not appear as spam to spam filters. I've heard that one of the ways to trick Bayesian filters is to mix in spam with huge chunks of "legitimate" text.
I wasn't sure, and still am not sure if that's what was really happening, because the volume of emailing -- ~ 50 messages a day -- was been a bit too low for what you would expect from a spammer. AFAIK, spammers send out ~ 5 MILLION messages a day. At the same time, the volume I'm seeing was a bit too high for "legitimate" users. So I said in my other blog.
Then a blogger Zerolove left this comment:
Could be using multiple Geeklogs!
I use Razor2 and DCC (Distributed CheckSum) to find spam that passes bayesian filtering. If I see that your post was to x number of other users I would block it. So by using multiple Geeklogs it would be coming from multiple IP's there for not only passing Bayes but also passing Razor and DCC. So if they use 100 websites x50 emails each it would add up!
That's an even better reason to disable the "Mail story to a friend" feature, and I did so. (To be fair, the flurry of emailed stories in the logs was a one-time occurrence -- I watched the logs for about a week before disabling this feature, and I didn't notice it again.) But I decided to be on the safe side.
In any case, I can't think of a more useless feature a content management system may have. Does anyone ever use them? I mean, not on my site, but in general? When people want to forward me something interesting, they just copy and paste the link in the email message. I never got stories forwarded to me by way of "Mail this story" feature. So I conclude it is one of those things that perfectly illustrate the law of unintended (and in this case, ironic) consequences: a thing that's only marginally useful to its intended users is widely open to abuse.