Posted by: bluesyemre | March 19, 2015

Reinventing Optical Character Recognition for early printed books

bible-latin

Most European historic cities, museums and universities hold early Latin books, as do many private collectors and learned societies. However, existing OCR packages are unable to digitise these texts effectively, and thus create an accurate searchable document. Early Latin texts use non-standard typefaces, abbreviations and page layouts. Built to handle standard print in modern languages, current OCR software is unable to recognise historic characters, or indeed Latin morphology, syntax and vocabulary. Digitisation methods for early printed Latin books therefore produce very poor results.

http://www.cilip.org.uk/cilip/blog/reinventing-optical-character-recognition-early-printed-books


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Categories

%d bloggers like this: