www.fgks.org   »   [go: up one dir, main page]

Optical Character Recognition (OCR) is a key component of large-scale digitisation projects that deal with text-based material. Typically such digitisation projects make use of closed, propriety software or commercial companies. This raises a number of issues, such as: the cost of proprietary software and/or external consultants; lack of flexibility and asaptability of closed software; deskilling of digitisation staff, as OCR expertise is concentrated in commercial companies; appropriateness of the software to historical material. This project will help close a significant skills gap by reducing the reliance on commercial OCR providers in favour of open source OCR technology, which will allow adaption and development through community involvement.

OCRopodium

Optical Character Recognition (OCR) is a key component of large-scale digitisation projects that deal with text-based material.  Typically such digitisation projects make use of closed, propriety software or commercial companies. 

This raises a number of issues, such as: the cost of proprietary software and/or external consultants; lack of flexibility and asaptability of closed software; deskilling of digitisation staff, as OCR expertise is concentrated in commercial companies; appropriateness of the software to historical material.

The OCRopodium project will address some of these issues by:

  • Trialling an open-source approach to Optical Character Recognition, using OCRopus software.
  • Embedding OCR activities within flexible, semi-automated digitisation workflows for text-based material.
Using a collaborative, distributed and semi-automated workflow embedded in institutional practices will help address the digitisation process from scanning, through OCR and mark-up, to ingest into a repository where the content is managed and preserved.

This project will help close a significant skills gap by reducing the reliance on commercial OCR providers in favour of open source OCR technology, which will allow adaption and development through community involvement.



Download the project plan (PDF)

Project Staff

Mark Hedges, Deputy Director, Centre for e-Research, King's College, London
Bookmark and Share
Summary
Start date
1 September 2009
End date
28 February 2011
Funding programme
Digitisation and e-Content
Strand
e-Content programme 2009-11
Project website
Lead institutions
Kings College, London
Partner institutions
Queen's University, Belfast (Centre for Data Digitisation and Analysis)
Committees
Topic
Fontsize disabled - Your browser does not support JavaScript