Google's ReCaptcha Service

Articles —> Google's ReCaptcha Service

Just about everyone has filled out a form which contains an image of squiggly letters and number that must be typed into a text field. The image is called a Captcha ('Completely Automated Public Turing test to tell Computers and Humans Apart') and helps validate the submitter of the form as a human, preventing malicious computer scripts (aka bots) from automating web submissions and thereby preventing spam, protecting website registrations and online polls, and even preventing website attacks.

As an example, in 1999 the website slashdot.com released a poll asking which school has the best computer science program (a poll in and of itself perhaps fishing for trouble). Students at both Carnegie Mellon and MIT automated the poll submission, altering the results in their favor with each finishing with over 20,000 votes. The captcha would have prevented such ballet stuffing, and although many looked unfavorably upon the captcha as an added neusance, after it came into prominent use it was truly effective at reducing the internet noise contributed by 'bots'. In 2008 it was estimated that more than 100 million people type captcha text every day1. It is thus no surprise that this gave incentive to improve the captcha, and in fact the Captcha improved in a brilliant way in 2009 the form of the reCaptcha.

The reCaptcha, a new version of the Captcha whose invention was spearheaded by Luis von Ahn, brings together not only web security but another incredibly difficult problem - optical character recognition (OCR). Optical character recognition is the process of taking an image of words and digitizing those words, a very difficult and error prone computational task. Using OCR, Google has undertaken the enormous task of scanning every printed document (eg books) and trying to digitize every word they contain. Yet given how error prone the process may be the help of humans would be invaluable to the process. In steps the reCaptcha. reCaptcha presents an image containing two words to a user. The brilliance behind this is that one word is known, the other is a word from a scanned document that could not be deciphered by optical character recognition alone. Thus, not only is the reCatcha validating the user as a human, but it also helps google's enormous undertaking of digitizing written documents.

To give the greatest audience for the reCaptcha, google provides a free public webservice for reCaptcha. This allows just about anyone to place the reCaptcha form validator on their website, and allows website owners to contribute to the digitization of the world's archive of books, an ingenious way of helping them accomplish this enourmous feat.

Resources

  1. Luis von Ahn et a. (2008) Science, 321 (12), 1465-1468.
  2. Telling Humans and Computers Apart Automatically.
  3. Wikipedia - reCaptcha


There are no comments on this article.

Back to Articles


© 2008-2022 Greg Cope