Examples of simple CAPTCHA designs
What is a CAPTCHA?
If you have ever registered yourself on any site (mail, social networking etc) there’s a good chance you’ve seen a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). It is a simple test to prove your humanity to the site you are visiting. In this post I will be discussing CAPTCHAs in general, and experimenting with some new designs that might further thwart spammers.
Why is it needed?
The problem stems from how HTML forms work – they’re very simple to fill out and submit. Using a specially coded software, a spammer could create thousands of email account in a matter of seconds, these could be used for sending email spam. (Because the email sending page is essentially another HTML form.) Spamming guestbooks is also a popular SEO technique to drive up a particular sites PageRank with both Google and other search engines, which employ similar mechanisms. (As a side note, good practice is to use the nofollow attribute for all outbound links if you’re ever coding a guestbook.)
CAPTCHA systems were invented to stop spammers. A CAPTCHA is usually a generated image that shows some sort of distorted text, which the user can read and input properly. A lot of work go into crafting CAPTCHAs that are simple for the user to read and understand, but difficult for OCR software to comprehend properly. This is usually done by adding distortion of some sort. But as OCR software improves and some people actually write custom code that target specific CAPTCHA systems, more obfuscation is needed, and so unreadable examples as shown in the image below are not uncommon.
An excellent example of a horrible CAPTCHA
Therefore I applaud alternative designs. There has been a few ones including identification of cats and dogs, although these are also flawed because of their limited databases, which are subject to brute force attacks.
Here’s another one I found while researching this post that I quite enjoyed:
An excellent example of an excellent CAPTCHA
This simple substitution CAPTCHA is good because it is unbreakable by standard OCR and requires a special algorithm. This involves someone programming it, which involves manpower, which involves money, which is a rare commodity – therefore, unless this kind of CAPTCHA is run on a high-profile site it would be too much hassle breaking it, and so this kind of alternative implementations works well for smaller sites.
My experiments
I know that I’ll be needing to use a CAPTCHA for some of the projects I am currently working on, and so I sat down a few days ago and started brainstorming. Some of these CAPTCHA variations probably already exist, others are a bit more exotic. Feel free to implement and use any of these. If you do, I’d love an email on how you are using them and what improvements you have made.
– Counting upwards
This is a pretty simple idea that differs from the vast majority because the letters are jumbled – every letter has a corresponding number. You type the word by starting from one (1) and counting upward, in this case the word is SECRET, which we see if we move the groups by numerical order:
SECRET
———-
123456
This sort of CAPTCHA should work pretty well on even high-profile pages, especially if you apply some distortion to the background, because it provides two points of errors – not only do the letters have to be identified correctly, so do the numbers. This one probably requires a small explanation on how to solve it on the page where it is used.
– Follow the path
This is one of the more exotic ones, and it builds upon the idea of following the letters around the shape in the direction of the arrow, in this case we start from H and wrap around the shape to make out HELLO. There is some Lorem Ipsum text inside the shape to further fool OCR. A full implementation should also randomize where the arrow points and its location from all possible letters. Distortion could be put on the background to make letter identification harder. This almost certainly requires an explanation on how to solve it on the site where it is used.
– The hierarchy
Last one wasn’t exotic enough for you? Meet the hierarchy. The basic idea is that letters are stacked on the edges of an arbitrarily large pyramid. The example shows one with four letters on each side, but it can be of any size. Inside the pyramid is an arrow twisting upwards – now follow its path and pick up the letters on the way to spell out TEST. A full implementation should randomize the path of the arrow, and perhaps even use other shapes (squares, circles) although just switching out the letters should work well for low-profile site use. If you use this to authenticate your comment form and don’t feature some sort of explanation on how to use it you probably won’t get a lot of comments…
Questions?
If you have trouble understanding how any of the examples work (I know my explanations are usually clumsy at best) go ahead and throw me a comment or an e-mail. You can find my address on the Fair Use page.
But it’s still possible to beat them!
How you might ask? Indian slave labor. The article linked is a fascinating insight into the world of the professional CAPTCHA solvers, who make as little as 0.001 dollars per solved CAPTCHA. A truly fascinating read about how cheap manpower can be used to circulate more spam around the planet.
See you next time!
This article has talked about CAPTCHA in general and some experimental approaches to the problem, next time we’ll discuss various ways of implementing a CAPTCHA using PHP. Stay tuned!