Show simple item record

dc.contributor.authorPedersen, Bjørn-Richard
dc.contributor.authorHolsbø, Einar
dc.contributor.authorAndersen, Trygve
dc.contributor.authorShvetsov, Nikita
dc.contributor.authorRavn, Johan
dc.contributor.authorSommerseth, Hilde Leikny
dc.contributor.authorBongo, Lars Ailo
dc.date.accessioned2022-02-09T13:38:12Z
dc.date.available2022-02-09T13:38:12Z
dc.date.issued2022-01-06
dc.description.abstractMachine learning approaches achieve high accuracy for text recognition and are therefore increasingly used for the transcription of handwritten historical sources. However, using machine learning in production requires a streamlined end-to-end pipeline that scales to the dataset size and a model that achieves high accuracy with few manual transcriptions. The correctness of the model results must also be verified. This paper describes our lessons learned developing, tuning and using the Occode end-to-end machine learning pipeline for transcribing 2.3 million handwritten occupation codes from the Norwegian 1950 population census. We achieve an accuracy of 97% for the automatically transcribed codes, and we send 3% of the codes for manual verification . We verify that the occupation code distribution found in our results matches the distribution found in our training data, which should be representative for the census as a whole. We believe our approach and lessons learned may be useful for other transcription projects that plan to use machine learning in production.en_US
dc.identifier.citationPedersen B, Holsbø EJ, Andersen T, Shvetsov N, Ravn J, Sommerseth HL, Bongo LA. Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes. Historical Life Course Studies. 2022;11:1-17en_US
dc.identifier.cristinIDFRIDAID 1986163
dc.identifier.doihttps://doi.org/10.51964/hlcs11331
dc.identifier.issn2352-6343
dc.identifier.urihttps://hdl.handle.net/10037/24000
dc.language.isoengen_US
dc.relation.journalHistorical Life Course Studies
dc.relation.projectIDinfo:eu-repo/grantAgreement/RCN/FORINFRA/225950/Norway/National Historical Population Register for Norway 1800-2024 (HPR) / Historisk befolkningsregister (HBR) 1800-2024//en_US
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2022 The Author(s)en_US
dc.titleLessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codesen_US
dc.type.versionpublishedVersionen_US
dc.typeJournal articleen_US
dc.typeTidsskriftartikkelen_US
dc.typePeer revieweden_US


File(s) in this item

Thumbnail

This item appears in the following collection(s)

Show simple item record