Re-writing the Global Library

    Posted: December 27th, 2007 | Author: MO | Filed under: Uncategorized |

    In recent week there have been great examples of data collection for a greater good. The first example I found out from the PBS show Wired. It was about the history and future of CAPTCHA. Luis von AhnA from Carnegie Mellon University explained the confusing acronym means “Completely Automated Public Turing test to tell Computers and Humans Apart”. As Carnegie Mellon owns the trademark and rights to most technology in use on millions of forms on the web it was interesting to find out that this simple security technology is actually doing a lot of good.

    Captcha

    As thousands of pages from books are being scanned each day from various projects to digitize the worlds literary works a problem was found; the technology known as OCR (optical character recognition) can make errors. These errors misinterpret a character and can leave the text in these newly digitized works sometimes jumbled and incorrect. The solution was to use CAPTCHA to pit one user against another to correctly figure out what character the OCR was having difficulty with. Carnegie Mellow took these questionable characters from the OCR and started to rotate them in with known characters. If enough users all recognized the questionable character in the CAPTCHA the error in the literary work would be corrected.

    This methodology is similar to several other projects already in existence, but this particular one has some real teeth. It’s estimated that because of our use of CAPTCHA for OCR correction 2.5 million web users as a collective are working 16 hours per day to help digitize all of humanity’s written works. Not a bad day’s work for buying those Hanna Montana tickets.

    Here is Luis von Ahn’s full interview about CAPTCHA from Wired Science:



    Leave a Reply