Web Security

Strange Referrers on WordPress.com State

If you are a blogger out there, I’m pretty sure that you will somehow check your visitors statistic at least once everyday. Especially for a new blogger like me, I am always eager to see how many visitors I have today.

I set my WordPress to publish to my facebook and twitter account every time I publish a new post. This happen to be my main source of traffic. And for my surprise, I get about 2 – 5 visits every new post I publish. It was quite impressive statistic for a new blog like mines and with only little friends or follower that really care about my existence. There are probably about 30% of my facebook friends that I added because of certain games I played on facebook. And my twitter followers are mainly those that wished that I follow them back so that they can push me constant advertisements everyday.

One day, there is a miracle happen. I have more than ten visitors on that day. And, I didn’t even post anything on that day. I monitored that for few days without posting any new posts. And it somehow maintain the 10 – 15 visitors everyday for 4 days. Isn’t that a great news?

No it isn’t really a good news at all. WordPress.com has this feature that allow you to see how your visitors got to your blog. It’s called the Referrer. As I said before, my main source of traffic is from facebook and twitter. So, I won’t be surprised if the referrers are facebook and twitter. But, they are not in the list. The referrers are some strange websites that I don’t ever know they exists. So, I visited the referrer’s site to see how and why did they refer my blog to their visitors. I was expecting to see some nice article talking about my blog. But I didn’t see anything about my blog there. D@#*, it’s a spam. I just got spammed!! After few days, they those spam links seems to be disappeared from my blog stat.

These past few days, I started to get some weird number of visitors again. And it is doubled this time. I have 20+ visitors for the past 3 days. So, I am blaming WordPress for this. And apparently, they are working on the way to fix this. I found out that this is not only happening in WordPress. So, I took the blame away.

I’m also glad that I’m not alone as you can see other people are reporting this on the forum. In this forum post, it’s stated that clicking on the spammed links will not mean you any harm. But it’s advised not to click on them as it means that their plan is working. They are just trying to get visitors to their sites but with the “not so right” way. So, now you are aware of it, don’t click them! So, they will someday find out that their plan has failed and hopefully stop doing that.

Web Security


I just recently read about reCaptcha and found it really interesting. Maybe it’s just for me or probably you already know about this. reCaptcha is a CAPTCHA (stand for Completely Automated Public Turing test to tell Computers and Humans Apart) system owned by Google. You will find this almost everywhere nowadays on websites to prevent spam or bots attack etc. Remember when some websites require you to type in the words from some ugly looking i in order to proceed or submit the form? that is CAPTCHA.

Why is it interesting? Because apart from preventing bots (human created programs) to enter our website, we (the internet users) are made to be a voluntary Human OCR Machine. Yes, we are working for Google for free!!

Well, that’s not my point. I am willingly contributing to this project because the reCaptcha itself is free for me to use. So, it’s fair enough.

How It Works

Google apparently scan a lot of old magazine, newspapers, textbooks etc to be digitalized. Those ancient papers are distorted and ugly of course. So, a normal OCR system will not be able to convert them into digital texts accurately. Therefor, they will have collection of documents with images of words that computer don’t understand.

reCaptcha presents 2 words to us. One of these words is taken from the documents above (which Google can’t read yet). This will be the “fake” word. Another one is a computer generated word (probably from those documents as well but is already converted to digital text) and will be the “real” word.

Human is able to perceive a lot more accurately than machine. So, when we see these images, we have more chance to identify what words they are. When we enter the 2 words and submit the form, reCaptcha will check if the “real” word above is answered correctly. If it does, the answer for the “fake” word will be added to the database. In other word, we only need to answer the “real” word correctly in order to pass the test. As we don’t know which one is real and which is fake, and also we already offer to volunteer in this project, we will normally answer both words.

reCaptcha will normally repeat the use of the same “fake” words in order to collect more answers. For sure, some of us might answer correctly and some might not. So, Google will have different sets answer for every single word. The set with higher answered will then be used as the translation of that word. One shot two birds.

Using It in PHP

First of all, you need to register your website and get 2 keys. They are some random letters that you need to put them in your PHP codes. Then you need to download the library from the reCaptcha website and include it in your php files. Call the recaptcha_get_html() function to display the CAPTCHA input box and recaptcha_check_answer() to check if the answer is correct. Here is the complete tutorial.

Security Issue

I am so proud that I can contribute to this project. But hackers are everywhere and many technologies had been deprecated just because they were hacked once. According to Google, they already applied some security measurements for it. Read more here. So, no worry about that.

But, I’m sure there should be some flaw in that system. So, I googled again. Something very interesting show up here. The system is not hacked yet so far (though there are some rumors that I think it’s just a rumor). But, there is something called “P**** Flood Attack”.

The attack is surprisingly easy to launch. Everybody can do it in fact just by following some simple guidelines provided here. The key to perform this attack is to identify which is the “real” word. After that, you can replace/answer the “fake” word whatever you want, including but not limited to “P****” word. So why is this flooding? If millions of people are doing this, the answer set discussed above will be flooded with the P word. And don’t be surprised if in the near future, you are reading some books or magazines online with some random P words appearing in the text.

Wait there…

What benefit do you get? thought we had agreed to volunteer this project? don’t worry, because the bad news is, the reCaptcha team already know about this and they had numerous protections implemented to prevent the flooding. I don’t know how the protections works anyway. But I think I’m secured enough to use reCaptcha in my websites. Happy CAPTCHA-ing