spam evolution

April 26, 2012
By

(This article was first published on Christophe Ladroue » R, and kindly contributed to R-bloggers)

Despite some rather modest protection (like a simple captcha), I still receive spammy comments on this blog every now and again. They’re easily spotted and actually never appear on the website.

There’s obviously an incentive for the spammer to post something as convincing as possible: either you’re taken in and think it’s a genuine comment, or it takes so much time for you to decide whether it’s genuine or not, you just give up. In order to achieve that, I’ve noticed a new generation of comments that simply copy texts from somewhere on the web. The text is more readable than a Markov-chain generated blurb and thus more taxing for the blogger to identify. It does it with a twist though: there’s usually a word seemingly deliberately misspelt. Here is an example:

Hi Louis apparently my honstig company have had a few issues today. As far as I can see, the images are there now. Have they returned for you as well? If not, I can try tweaking a few things and seeing what happens

I wondered why the spelling mistake was introduced and my current, unsubstantiated guess is that it’s a way for the spammer to detect which have gone through and identify blogs that are weak on security.

Today I’ve started receiving an even more pernicious spammy comments on my blog: the comments are genuine comments from R-related blogs and thus even more difficult to spot since they seem, at least superficially, somewhat related to the post they’re posted under. Here is an example:

Lattice and ggplot add a lot of value in that they pruocde objects with which you can do things. Also, the whole reason lattice (trellis) was created in the first place was to provide a powerful system that takes care of a lot of tedious things. For example, if you want a histogram conditional on some categorical variable, you’ve got it immediately. Just because it also works in the simple case presented above does not mean it is an equivalent alternative to hist(). I would say that having many options does not make R look like legacy at all. If you need something simple, use something simple (like hist()). If you need something more powerful and flexible, use that.

It threw me at first, because my original post was indeed about ggplot but it was completely off-topic and I got suspicious. I found its origin on a 2009 blog post. Notice that the spelling mistake does not appear in the original (?) comment.

I filed the comment as spam, slightly amused by the attempt and what do you know? A few hours later, I receive another spammy comment, which is exactly the reply of the comment in the original thread.

to whom it may concern I was never in doubt, that havnig graphic objects and conditioning is an advantage (sorry, when I was unclear at this point) but as you already pointed out, there are already two packages which are mostly equivalent from an ordinary user’s perspective.My concern regards havnig many packages in parallel with very much overlap and little structured and coordinated progress.

Again, with added misspelt words. This type of spam definitely requires more time to identify and I guess it’s achieving its purpose. I wonder how widespread this is. One unintended consequence of this might be fewer off-topic comments though!

To leave a comment for the author, please follow the link and comment on his blog: Christophe Ladroue » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.