Most European historic cities, museums and universities hold early Latin books, as do many private collectors and learned societies. However, existing OCR packages are unable to digitise these texts effectively, and thus create an accurate searchable document. Early Latin texts use non-standard typefaces, abbreviations and page layouts. Built to handle standard print in modern languages, current OCR software is unable to recognise historic characters, or indeed Latin morphology, syntax and vocabulary. Digitisation methods for early printed Latin books therefore produce very poor results.
http://www.cilip.org.uk/cilip/blog/reinventing-optical-character-recognition-early-printed-books
Leave a Reply