<< précédent :: [début] :: suivant >>

Éditer :: []->

The issue of digitalising old books

Old books that are in the public domain could easily be made available to a larger audience on the Internet; however, to facilitate the search for these books, it becomes necessary to process a page scan (which is actually a photocopy) into digital text where searches can be made. Optical Character Recognition (OCR) software is in charge of this, but very often this software has many problems with this type of books. To improve their recognition rate, optical character recognition software programmes need to "learn". This means that their results must be compared to the results obtained by humans to gradually increase the number of characters they are able to recognise. However, transcription by humans is a long and repetitive task.