« Timeo Danaos et Dona Ferentes | Main | Saving Social Security »
December 14, 2004
Major Combat Operations Against Comment Spam Are Completed--Not!
The Nielsen Haydens report:
Making Light: Smokin' spam: We’ve been hit hard by comment spam this weekend. I’m talking 480 spams in ten minutes on Saturday morning. None of it has gotten past the combination of MT Blacklist plus the latest version of Movable Type, and Patrick hasn’t had to devote undue time or trouble to killing it. I’d be interested in knowing whether anyone else got hit. In the meantime, if you’re having comment spam problems, consider upgrading to these fine, fine software products.
Two comments. First, Jay Allen's MT Blacklist is an amazing program. Everyone who uses Movable Type and suffers from comment spam should install it. And everyone who installs it should mosey on over to paypal and pony up:
MT-Blacklist v1.6.5 User Guide: As much as my altruistic side would love to continue to develop MT-Blacklist and maintain the Comment Spam Database for free, it puts a tremendous strain on me in terms of time and energy. Every minute I spend responding to technical support requests, poring over Clearinghouse spam submissions and upgrading the program itself is a minute stolen from my freelance work projects. Because I currently have a large level of personal debt, I do not have the luxury to spend those precious minutes on things which do not somehow contribute to its reduction. Under the current licensing terms, MT-Blacklist is free. However, if you would like to see the program developed for future versions of Movable Type or with additional capabilities, you may want to consider donating. Think of it this way: How much time and energy has MT-Blacklist saved you? You decide how much that is worth.
Second... Perhaps there are enough softer targets out there that the comment spammers will not devote the time and energy to crack the defenses of the well-armored... But perhaps not. For the weblogging world as a whole, major combat operations against comment spam are not over. As long as Google (and Microsoft, and the others) default to indexing weblog comments without a specific declaration at the head of the file that comments should be indexed, spamming weblogs with irrelevant comments is a good way to boost your website's salience in the search databases. And with idle CPU power and connectivity hanging around, there will be people--lots of people--who will think, "Why not?" for they believe that do what thou wilt be the whole of the law.
In this one area, GOOGLE IS EVIL!!!
The solution needs to come at the Google level. As long as Google provides a search engine that rewards comments spammers, there will be comment spammers and they will spend more and more time and ingenuity figuring out how to crack the defenses set up by Jay Allen and others.
Posted by DeLong at December 14, 2004 06:14 PM
Trackback Pings
TrackBack URL for this entry:
http://www.j-bradford-delong.net/cgi-bin/mt_2005/mt-tb.cgi/11
Listed below are links to weblogs that reference Major Combat Operations Against Comment Spam Are Completed--Not!:
» Fighting comment spam from Ed Bott - Windows (and Office) Expertise
I allow comments on this Web site. In fact, I encourage them. In the past, I've had to shut down comments for fairly long periods of time because of "comment spam," automated attacks that fill the comments section with plugs for whatever sleazy product... [Read More]
Tracked on December 15, 2004 05:50 PM
» Brad figures out MT-Blacklist is a godsend from B12 Partners
In all fairness though, Six Apart has put a hell of a lot of work into a seriously powerful and exstensible API framework with which a plugin author could do anything they want including making a killer anti-spam plugin (*ahem* :-).If you think about i... [Read More]
Tracked on December 15, 2004 10:35 PM
» Google rules: report spam ! from Idealistic but not naive, realistic but not cynical
It is no fun anymore. I managed to see some irony in comment-spam before. I lost links between my own posts in the fight against comment-spam but now I [Read More]
Tracked on December 16, 2004 06:50 AM
Comments
Would requiring commenters to preview their comments before submission cut down on the spam?
This can be done automatically: Melanie's blog (Just Another Bump in the Beltway, www.node707.com) is set up that way, and she also uses Movable Type (v. 2.64).
Posted by: RT at December 14, 2004 06:32 PM
Why not build an e-mail confirmation into comment posting? When a user with a certain email address posts their first comment to your weblog, their comment wouldn't show up until they'd clicked on a unique link sent to their e-mail address.
Posted by: Scott Teresi at December 14, 2004 06:57 PM
I have moderate comment spam over at Pom D'or. I haven't done anything to combat it. I'm not sure how to armor a WordPress blog anyway. But I have so few readers and so few linkers that nobody is much served by spamming my weblog, so the damage is limited.
Posted by: Ben at December 14, 2004 07:13 PM
Blacklist is great. But hey the little text came back again briefly and I thought this big keep-it- between-the-lines-children print would be saved for day when we're all completely nearsighted from our excessive reading habits...but no?
Posted by: fussypatronofthearts at December 14, 2004 09:10 PM
Examining your installation, it seems that Moveable Type's design makes it more difficult for Google to do just that. If the permalinked page did not include the comments, it would be easy for every blogger using Moveable Type to use the robots.txt file to keep the googlebot out. Unfortunately in this case, you're asking the google bot to index the first portion of the permalinked page, and then you'd like the google bot to be smart and say, Oh, this is a blog, I shouldn't index the rest of that page. Smart as Brin and Page are and there company is, I bet the robot is too stupid to do that, and that asking the robot to do that would risk breaking non blogger pages.
A more search engine, less spammer friendly page would have the permalinked page NOT include the comments. Such a blog design would make it much easier to robots.txt the googlebot away. Arguably, this might make it less friendly to commenters, requiring an extra click.
Another form of mitigation would be a smarter movable type or movable type/apache filter that recognizes the googlebot user agent and serves up the page sans comments.
One other note, Haloscan, as incredibly lousy as it is, is sort of an example of how to be antispammer. Permalinking to atrios doesn't bring up the comments.
Finally, you could also move to the Craigslist model. Let n or more readers flag comments for being spam and then create a bot to waste those spams. That does risk abuse which in the emotional world of the blogosphere is a very real risk.
So Google evil or Movable type evil or spammers evil. Goto considered harmful.
And yes, thank you for changing the type again, it's much easier to read in black on white.
Posted by: jerry at December 14, 2004 09:46 PM
[Nope: No eBay advertisements in a thread about comment spam, please]
Posted by: ChicoD at December 14, 2004 10:02 PM
Spamming has been a serious problem at The Left Coaster until we installed mt-blacklist AND added mt-close2 (which can close your older posts easily). We used mt-blacklist for several months, but would still get massive spam attacks on any open comments as long as there was no entry that blocked that spammer. A couple of weeks ago (via this Kevin Drum post: http://www.washingtonmonthly.com/archives/individual/2004_12/005258.php) I discovered mt-close2 (http://thedeadone.net/sw/000480.shtml) which provided a mechanism for closing posts that are greater than X days old (both for comments and for pings).
Another tip that I got was blocking spam by stopping bob@yahoo.com from making comments.
Add this to your mt-blacklist:
\bbob@y[^\s.]+o\.com
I've found the pings are the worst - mt-blacklist lets me screen and delete these without having to load the really offensive pages that use that mechanism to get attention.
Posted by: Mary at December 14, 2004 10:56 PM
Hey Brad! Thanks for the major props. I'm glad that it's working as well for you as it is for me. For extra spam-blocking action, make sure to check out the recommedations in this post. It'll save you a lot of heartache if you haven't already been hit by some of them.
Also, in all fairness, I'm doing much better these days financially since I've taken the gig as Movable Type Product Manager at Six Apart. Of course, if you still feel the love enough to donate, I'm not going to let it go to waste. I've still got a backlog of debt built up that needs to be paid off. But the major alarm bells have quit sounding, thank God.
Anyway, go kick some spam butt!
Posted by: Jay Allen at December 14, 2004 11:05 PM
Google is not to be blamed for this.
If you don't want your comments to be indexed, tell the Googlebot.
However, i feel that indexing comments is a good thing, since they sometimes make up the majority of content of a blog.
Just make it hard for spammers to enter comments:
Close down comments on older threads. This is important.
Use a challenge-response scheme (captcha) for comments that kills automated spam.
Some other solutions:
http://www.blogspam.org/solutions.html
Posted by: Felix Deutsch at December 15, 2004 01:51 AM
I'll have to think some more about what Movable Type should have already done...
Posted by: Brad DeLong at December 15, 2004 03:25 AM
As others have said, Google really doesn't deserve the blame for this, and I'll go further and state that the real responsibility lies with Movable Type, which, for all its other fine qualities, has had very little thought put into comment management.
For example, as others have alluded to, it ought to be impossible to post a comment without first doing a preview of it - the importance of this is that it puts an end to blind IP spoofing, as a fake IP address will be unable to respond to the server's message.
Another example of MT's failings is that it has neither inbuilt CAPTCHA facilities nor a default capacity to close older posts to comments, meaning that anyone who's been blogging long enough will be vulnerable to having his or her archives splattered with all sorts of filth.
The only spam control mechanism MT does provide by default is IP banning, but this is stupid in the extreme, not just because most people are behind dynamic IP addresses, but because IP addresses are so easy to fake: I've had spammers use the 127.0.0.1 address in comments on my blog, and I'd have been foolish indeed had I blindly banned away, as I would have locked myself out of my own comments section!
Posted by: Abiola Lapite at December 15, 2004 03:31 AM
Thanks, Jay.
Posted by: Brad DeLong at December 15, 2004 03:32 AM
[And the first true comment spam attracted to this thread...]
Posted by: Aha! at December 15, 2004 04:05 AM
"This post" being: http://www.jayallen.org/comment_spam/2004/12/master_blacklist_corrections_and_suggestions
Posted by: Jay Allen at December 15, 2004 05:26 AM
Getting comment spam is a sign of popularity. It's like being a Hollywood star. A lot of nuisance comes with it. Deal with it. If I got comment spam, it would be that my little site had arrived.
Posted by: John Emerson at December 15, 2004 07:05 AM
By the way: this new installation has never really rendered well on either my geriatric at-work computer (Mac 8.6/IE 5.1) or my more modern iBook at home (OS 10.2.8/Safari). This morning, f'rinstance, there's about a screen's worth of whitespace between each line.
Posted by: jlw at December 15, 2004 07:08 AM
"evidence that I had arrived".
Posted by: John Emerson at December 15, 2004 07:32 AM
By the way, I think captcha's and previews are probably the best way to go, I just wanted to respond to the Google evil statement. Google will be evil one day, but near as I can tell, that day hasn't arrived.
Though it got Lycos in trouble, the way I would improve MT Blacklist would be to turn it into MT Blacklist/DOS. There is no reason that every hit to a blog shouldn't generate one or one hundred hits to a known spammer using the IP address of the person viewing the blog. A wonderful piece of intentional site crossscripting javascript generated by MT Blacklist/DOS could provide just that. Talk about a real measure of the blogosphere! Okay, okay, I'm kidding.
Posted by: jerry at December 15, 2004 07:58 AM
"Close down comments on older threads. This is important."
Better yet, MT-Blacklist provides a mechanism for moderating old entries. This is even better as you don't close the lines of communication between you and your audience and freeze threads with potentially outdated information in time.
"I'll go further and state that the real responsibility lies with Movable Type, which, for all its other fine qualities, has had very little thought put into comment management."
I agree! And it's one thing I intend to fix as Product Manager. In all fairness though, Six Apart has put a hell of a lot of work into a seriously powerful and exstensible API framework with which a plugin author could do anything they want including making a killer anti-spam plugin (*ahem* :-).
If you think about it for a second, making a program extensible by the world of developers is far more important than any handful of features a company could put in. If you were a small startup with limited resources, it's probably the smartest thing you could do.
"For example, as others have alluded to, it ought to be impossible to post a comment without first doing a preview of it - the importance of this is that it puts an end to blind IP spoofing, as a fake IP address will be unable to respond to the server's message."
Both comments and trackbacks should require two-way communication. Today they do not.
"Another example of MT's failings is that it has neither inbuilt CAPTCHA facilities nor a default capacity to close older posts to comments, meaning that anyone who's been blogging long enough will be vulnerable to having his or her archives splattered with all sorts of filth."
I consider both things harmful. The CAPTCHA is an accessibility nightmare of which sadly I played a hand in creating[1]. Closing comments, as I alluded to before, creates stagnation in the web and prevents the correction of obsolete and potentially harmful information.
"The only spam control mechanism MT does provide by default is IP banning, but this is stupid in the extreme, not just because most people are behind dynamic IP addresses, but because IP addresses are so easy to fake"
I couldn't agree more[2], especially without requiring two-way communication for user submissions.
(Brad, why can't I link? Also, your comment throttle must be set at an hour, because I still got throttled despite typing this long post. :-)
[1] - http://www.jayallen.org/comment_spam/2004/06/a_small_sabbatical#comment-7845
[2] - http://www.jayallen.org/comment_spam/2004/05/mtb_20_and_ip_banning
Posted by: Jay Allen at December 15, 2004 08:11 AM
Abiola--
>For example, as others have alluded to, it ought to be impossible to post a comment without first doing a preview of it - the importance of this is that it puts an end to blind IP spoofing, as a fake IP address will be unable to respond to the server's message.
Even better: It is impossible to have a working TCP connection (needed to do even a Preview or any HTTP action in the first place) with a spoofed IP!
So IP-spoofing is clearly not a problem. Duh.
What you mean is probably just blindly posting comments in one HTTP POST; but this requires an established (3-way handshake) TCP connection with a real IP#.
Posted by: Felix Deutsch at December 15, 2004 09:27 AM
OT - Brad, thank you for restoring the larger type and black font. MUCH easier for these old eyes to read. Very responsive action after yesterday's comments from me and Anne.
Posted by: MaryLou Corrigan at December 15, 2004 11:02 AM
"Even better: It is impossible to have a working TCP connection (needed to do even a Preview or any HTTP action in the first place) with a spoofed IP!
So IP-spoofing is clearly not a problem. Duh.
What you mean is probably just blindly posting comments in one HTTP POST; but this requires an established (3-way handshake) TCP connection with a real IP#."
I actually have read my "Internetworking with TCP/IP," and I have to insist that you're wrong if you think IP source addresses can't be forged; how do you think the initiators of DDOS attacks cover up their tracks? If you think IP spoofing is a figment of my imagination, I suggest you read the following:
http://www.scs.carleton.ca/~dlwhyte/whytepapers/ipspoof.htm
Posted by: Abiola Lapite at December 15, 2004 11:38 AM
Just to second Jay: CAPTCHA is an accessibility nightmare. Perhaps Abiola might wish to switch on a text-to-speech program and try to comment on a CAPTCHA-enabled blog with his eyes closed?
Posted by: ahem at December 15, 2004 11:59 AM
While IP spoofing is a huge problem, we're not currently seeing a lot of it in the weblog world because it's far too easy for spammers to simply use a rotating list of anonymous proxies.
Believe me, we are well aware of both problems and the solutions posted here. Not to mention a few NOT posted here. ;-)
Here's an interesting fact: The TypePad system gets more spam than all of you combined. Anyone who thinks we're not motivated to fixing this problem once and for all is smoking banana peels. :-)
Posted by: Jay Allen at December 15, 2004 12:28 PM
>I actually have read my "Internetworking with TCP/IP," and I have to insist that you're wrong if you think IP source addresses can't be forged; how do you think the initiators of DDOS attacks cover up their tracks?
I didn't write that IP spoofing didn't exist. I wrote that establishing a TCP connection (needed for actually interacting with a web server in order to Preview or Post a comment to a weblog) with spoofed packets is infeasible.
As for DDOS attacks: They don't need to complete a 3-way handshake. Saturating the link with traffic or overwhelming the attacked IP stack with half-open connections is sufficient to knock down a host.
In fact, most current DDOS attacks don't use forged source addresses anyway.
Posted by: Felix Deutsch at December 15, 2004 12:36 PM
"I didn't write that IP spoofing didn't exist. I wrote that establishing a TCP connection (needed for actually interacting with a web server in order to Preview or Post a comment to a weblog) with spoofed packets is infeasible."
It's only "infeasible" if sequence number prediction is "infeasible" as well. Unfortunately for your argument, in practice there are very many machines out there for which such a feat is well within the realm of possibility: again, NMAP is a tool which will easily tell one when such a thing is possible.
Posted by: Abiola Lapite at December 15, 2004 02:42 PM
"Smoking banana peels"?! Dammit, Jay, you must be OLD!!
Like me. I remember that hoax. Supposedly had something to do with the 'electrical banana' line in Donovan's "Mellow Yellow."
Posted by: RT at December 15, 2004 05:41 PM
I might be (virtually that is) "Smoking banana peels", but my guts tell me that IP spoofing is unlikely to be a problem in comment spamming.
My actual experience is a bit old, so I may be wrong.
The solutions well known. Filtering at border routers and semi-random sequence numbers.
(rfc 1948, rfc 2827?)
And assuming that most blogs run on up to date servers, hosted commercially or at universities, those remedies should be present.
Posted by: Luc at December 15, 2004 09:36 PM
>It's only "infeasible" if sequence number prediction is "infeasible" as well. Unfortunately for your argument, in practice there are very many machines out there for which such a feat is well within the realm of possibility: again, NMAP is a tool which will easily tell one when such a thing is possible.
I am very well aware of that. Unfortunately "within the realm of possibility" doesn't mean feasible. As mentioned by Luc above, most weblogs are not hosted on machines with weak ISNs.
This is Brad's weblog host:
TCP Sequence Prediction: Class=truly random
Difficulty=9999999 (Good luck!)
And remember to keep the host silent to which the spoofed IP belongs.
For a comment spammer, who wants to spam a huge volume of blogs in a short time and will move from an identified spammer host in an instant or even controls a zombie army of thousands of compromised machines, IP spoofing is simply nonsensical, and quite probably, technically infesasible.
Posted by: Felix Deutsch at December 15, 2004 10:52 PM
"And assuming that most blogs run on up to date servers, hosted commercially or at universities, those remedies should be present."
This is a ridiculous assumption! Are the only people you know at universities? Perhaps the likes of Hosting Matters and TypePad are "universities" in your lexicon ...
"As mentioned by Luc above, most weblogs are not hosted on machines with weak ISNs."
Oh really? And where's your evidence? You simply assert something, and - VOILA! - it MUST be true.
"IP spoofing is simply nonsensical, and quite probably, technically infesasible."
But weren't you admitting upstream that it WAS possible, right after I'd pointed you to a paper explaining how it was done? Too bad it actually happens, then, as I had to deal with it just this week; a spammer selling Belkin WiFi routers with an IP address of 127.0.0.1. What was that again about "impossible?"
All your bluster can't change the fact that you don't know what you're talking about. That Brad's particular weblog host (UC Berkeley I presume) happens to have a decent security setup does NOT imply that the same is true for most blogs. Anyone with half a clue would know better than to generalize from a sample of 1.
Posted by: Abiola Lapite at December 16, 2004 12:42 AM
Abiola: May I humbly suggest that the fervor of your contributions here has been getting a bit out of hand?
Posted by: cm at December 16, 2004 01:06 AM
"Abiola: May I humbly suggest that the fervor of your contributions here has been getting a bit out of hand?"
May I suggest that I'll take the likes of you seriously when you become willing to identify yourselves by your own names, rather than under anonymous monikers? Besides, your definition of "out of hand" seems to mean "able to cogently defend oneself against accusations of ignorance", which is something I have absolutely no shame about doing.
Posted by: Abiola Lapite at December 16, 2004 04:05 AM
"And assuming that most blogs run on up to date servers, hosted commercially or at universities, those remedies should be present."
"This is a ridiculous assumption!"
Except that it isn't. (Hosting Matters and Typepad fall under the commercial category)
Why would a comment spammer spoof IP adresses when using proxies or other methods is simpler and more reliable?
A 127.0.0.1 in a log can be other things than IP spoofing. And if it is you do need to improve your security. See those RFC's.
Posted by: Luc at December 16, 2004 06:43 AM
>But weren't you admitting upstream that it WAS possible, right after I'd pointed you to a paper explaining how it was done?
I don't need some toy pointing me to a paper to learn about IP spoofing. OK?
Posted by: Felix Deutsch at December 16, 2004 08:01 AM
[Uh oh. The second piece of comment spam to get past MT-Blacklist...]
Posted by: Anonymous at December 16, 2004 05:56 PM