Posted by: bluesyemre | March 19, 2015

Reinventing Optical Character Recognition for early printed books


Most European historic cities, museums and universities hold early Latin books, as do many private collectors and learned societies. However, existing OCR packages are unable to digitise these texts effectively, and thus create an accurate searchable document. Early Latin texts use non-standard typefaces, abbreviations and page layouts. Built to handle standard print in modern languages, current OCR software is unable to recognise historic characters, or indeed Latin morphology, syntax and vocabulary. Digitisation methods for early printed Latin books therefore produce very poor results.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.


%d bloggers like this: