Working with Archival Texts as Data –
Post-OCR Error Correction with OpenRefine
Digitization can provide far greater access to archival documents, but their discoverability via full text searches may be hampered by error-laden optical character recognition (OCR). Correcting them manually, however, is unrealistically time-consuming. In this workshop, we will use OpenRefine – a spreadsheet-like software – to accelerate the finding and correction of spelling errors introduced through the OCR process.
Time: 4:00 pm ET/ 1:00 PT (2 hours)
Date: April 13, 2022
Cost: Members $ 50.00, Non-Members: $ 75.00, Student Members / Precariously employed members: $ 25.00
Learning outcomes: By the end of the workshop, you will be able to 1) perform initial data analysis on OCR text output and 2) apply computational techniques to correct common OCR errors.
Instructor Bio: Devon Mordell is an educational developer at McMaster University, living and working on the traditional territories shared between the Haudenosaunee confederacy and the Anishinaabe nations. She has a Master of Archival Studies from the University of British Columbia and worked as a digital archivist at the University of Windsor before returning to online learning amidst the COVID-19 pandemic. She teaches workshops on digital scholarship topics such as online exhibit creation and writes about issues that can be broadly characterized in the vein of archival futurism.
Suite 1912-130 Albert Street
Ottawa, Ontario K1P 5G4
The ACA office is located on the unceded, unsurrendered Territory of the Anishinaabe Algonquin Nation whose presence here reaches back to time immemorial.
Privacy & Confidentiality - Code of Ethics & Professional Conduct
Copyright © 2022 - The Association of Canadian Archivists