I’m utilizing reCAPTCHA to stop comment and/or email spam by “reading books”. (If you don’t know what a CAPTCHA is, their website explains it comprehensively.) Installing the plugin and getting the API keys were simple.
The book pages are being photographically scanned, and then, to make them searchable, transformed into text using “Optical Character Recognition” (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect.
reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.
This seems like a win/win situation, right? So why don’t more websites (i.e. blogs) utilize this service?