Experimenting with alternative CAPTCHA designs and a brief history of the modern day CAPTCHA

50739Examples of simple CAPTCHA designs

What is a CAPTCHA?
If you have ever registered yourself on any site (mail, social networking etc) there’s a good chance you’ve seen a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). It is a simple test to prove your humanity to the site you are visiting. In this post I will be discussing CAPTCHAs in general, and experimenting with some new designs that might further thwart spammers.

Why is it needed?
The problem stems from how HTML forms work – they’re very simple to fill out and submit. Using a specially coded software, a spammer could  create thousands of email account in a matter of seconds, these could be used for sending email spam. (Because the email sending page is essentially another HTML form.) Spamming guestbooks is also a popular SEO technique to drive up a particular sites PageRank with both Google and other search engines, which employ similar mechanisms. (As a side note, good practice is to use the nofollow attribute for all outbound links if you’re ever coding a guestbook.)

CAPTCHA systems were invented to stop spammers. A CAPTCHA is usually a generated image that shows some sort of distorted text, which the user can read and input properly. A lot of work go into crafting CAPTCHAs that are simple for the user to read and understand, but difficult for OCR software to comprehend properly. This is usually done by adding distortion of some sort. But as OCR software improves and some people actually write custom code that target specific CAPTCHA systems, more obfuscation is needed, and so unreadable examples as shown in the image below are not uncommon.

badcaptcha4An excellent example of a horrible CAPTCHA

Therefore I applaud alternative designs. There has been a few ones including identification of cats and dogs, although these are also flawed because of their limited databases, which are subject to brute force attacks.
Here’s another one I found while researching this post that I quite enjoyed:

goodcaptcha1An excellent example of an excellent CAPTCHA

This simple substitution CAPTCHA is good because it is unbreakable by standard OCR and requires a special algorithm. This involves someone programming it, which involves manpower, which involves money, which is a rare commodity – therefore, unless this kind of CAPTCHA is run on a high-profile site it would be too much hassle breaking it, and so this kind of alternative implementations works well for smaller sites.

My experiments
I know that I’ll be needing to use a CAPTCHA for some of the projects I am currently working on, and so I sat down a few days ago and started brainstorming. Some of these CAPTCHA variations probably already exist, others are a bit more exotic. Feel free to implement and use any of these. If you do, I’d love an email on how you are using them and what improvements you have made.

– Counting upwards


123SEC_Option1
This is a pretty simple idea that differs from the vast majority because the letters are jumbled – every letter has a corresponding number. You type the word by starting from one (1) and counting upward, in this case the word is SECRET, which we see if we move the groups by numerical order:

SECRET
———-
123456

This sort of CAPTCHA should work pretty well on even high-profile pages, especially if you apply some distortion to the background, because it provides two points of errors – not only do the letters have to be identified correctly, so do the numbers. This one probably requires a small explanation on how to solve it on the page where it is used.

– Follow the path

Pentagon_Option1
This is one of the more exotic ones, and it builds upon the idea of following the letters around the shape in the direction of the arrow, in this case we start from H and wrap around the shape to make out HELLO. There is some Lorem Ipsum text inside the shape to further fool OCR. A full implementation should also randomize where the arrow points and its location from all possible letters. Distortion could be put on the background to make letter identification harder. This almost certainly requires an explanation on how to solve it on the site where it is used.

– The hierarchy
Triangle_Option1
Last one wasn’t exotic enough for you? Meet the hierarchy. The basic idea is that letters are stacked on the edges of an arbitrarily large pyramid. The example shows one with four letters on each side, but it can be of any size. Inside the pyramid is an arrow twisting upwards – now follow its path and pick up the letters on the way to spell out TEST. A full implementation should randomize the path of the arrow, and perhaps even use other shapes (squares, circles) although just switching out the letters should work well for low-profile site use. If you use this to authenticate your comment form and don’t feature some sort of explanation on how to use it you probably won’t get a lot of comments…

Questions?
If you have trouble understanding how any of the examples work (I know my explanations are usually clumsy at best) go ahead and throw me a comment or an e-mail. You can find my address on the Fair Use page.

But it’s still possible to beat them!
How you might ask? Indian slave labor. The article linked is a fascinating insight into the world of the professional CAPTCHA solvers, who make as little as 0.001 dollars per solved CAPTCHA. A truly fascinating read about how cheap manpower can be used to circulate more spam around the planet.

See you next time!
This article has talked about CAPTCHA in general and some experimental approaches to the problem, next time we’ll discuss various ways of implementing a CAPTCHA using PHP. Stay tuned!

Explore posts in the same categories: Computers, Development, Technical solutions

Tags: , , , , , , , , , ,

You can comment below, or link to this permanent URL from your own site.

5 Comments on “Experimenting with alternative CAPTCHA designs and a brief history of the modern day CAPTCHA”

  1. mollymolly Says:

    Ibland är de för svåra så man slår flera ggr. Finns det ngn limit för försök när sida blir “misstänkasam” som på bankernas sidor?

  2. Lemming Says:

    The problem with your CAPTCHAs are that they’re too simple by design to provide any real additional security. The only security they provide is being different, and will only work until someone tries to attacks them. This is analogue to a bad cryptographic design.

    http://en.wikipedia.org/wiki/CAPTCHA#Computer_character_recognition

    The use of non-alphanumeric characters is interesting. However, the lack of distortion to the image make the characters easily separable, which make them easy to match. The replacement set is clearly defined in a section below, so matching for those characters and pairing them up with the adjacent letters should prove to be simple enough.

    SECRET

    For this to provide any additional security, the image would have to be distorted beyond pairability between the characters and the number order. The design would have to make it undistinguishable that the ‘1’ and the ‘S’ is a pair. But the design enforces that the sixth ‘order digit’ is below the sixth character, and that the number of order digits must also be equal to the number of characters. For this design to provide additional security, it must probably be distorted beyond any form of human readability.

    HELLO

    Because the uppercase characters stick out a lot more than they should, they’re easily separable from the unrelated text in the middle. All characters are still very distinctly placed in different spots, and still all easily recognized. The arrow’s easily identified too, which makes determining order easy too, as the character positioning is simply determined by its angle from the center, and has the same radius. Not that an elliptic curve would make it that much harder, of course.

    Pyramiden

    The characters are still in very distinct places and easy to identify. If the CAPTCHA design enforces strict construction based that every next character is placed on the other side, even if we add going either up or down, we can guess for permutations AKGF, TEST, FGKA and TSET. Just guessing here would give us a 25% success rate, which results in a whole lot of false positives. Also, as the arrow inside the pyramid sticks out, it’s not any harder to identify than the characters, including the head. Based on this there’d be a 100% success rate in breaking this CAPTCHA.

    Also, the usage of dictionary words almost completely eliminates security given by permuting these elements, making most of the design redundant.

    This being a problem in practice is based on its usage. If any of these CAPTCHAs became widespread enough for someone to want to attack them, they fall like bricks, unfortunately.

    • khromov Says:

      Hi and thank you for the comment!

      You do have valid thoughts about the captcha designs, so let me make a few additions to my post:

      The CAPTCHA designs are provided at their most rudimentary. It would not be pedagogical to show a highly distorted example, such as would be used in real life scenarios, as the idea of the CAPTCHA functionality may be lost to the reader. I do in fact mention that distortion ought to be in place, especially with the SECRET example. I do maintain that it is better than a regular CAPTCHA, because it has two additional points of possible error:
      1.) incorrect identification of letter/number pairings
      2.) double the amount of letters/numbers to identify

      … of course, in the example it’s really simple, but with some background noise and shape distortion it should be much less so for a computer.

      I do use dictionary words in the examples, and give no real indication that THEY SHOULD NOT BE USED! There you have it! 🙂
      One should always use (at least) random alphanumerical characters in both upper- and lowercase.

      Also, I know that some of these would fall flat in their simplest of form, which is why I suggest a full implementation that has additional features (such as shape shifting with the path and hierarchy captcha) and paired with a healthy amount of distortion, it still provides great security.

      I also emphasize that people have to develop their own CAPTCHA designs, because even simple non-standard ones give pretty much full protection until they are broken by hand by someone, through programming or CAPTCHA solvers, and for that I consider my examples to work great.

  3. Lemming Says:

    In case you missed it:

  4. Sugar free Says:

    What i don’t understood is actually how you’re no longer actually much more neatly-favored than yyou maay be
    right now. Youu are so intelligent. Youu already know thus considerably with regards to
    this topic, produced me individually imagine itt from so many varied angles.
    Its like women and men are not involved except it iss one
    thing to do with Woman gaga! Your personal stuffs outstanding.

    At all times car for it up!


Leave a reply to khromov Cancel reply