Deciphering the Past: A Journey of Transcribing Handwritten Journals
Holding my great grandfather's journals, a torrent of possibilities rushed through my mind. Visions of a digital sanctuary where these pages could come alive—searchable entries that users could revisit and bookmark, a space where every passage, every memory, every sentiment would be just a click away. I even dreamt of crafting a chatbot, a virtual echo of my great grandfather, brought to life through his own words and stories. The excitement was palpable, yet all these grand aspirations shared one foundational need: an accurate digital transcription of the decades-old text.
Challenge: Accurate Transcription of Handwritten Journals
The elegance of my great grandfather's cursive posed a surprising obstacle. While today it seems like AI can do anything, transcription AIs faltered before this challenge. Initial attempts with Google Vision API yielded disappointing results. I was confident I was on the right track but felt like I hit a wall. The AI had an accuracy rate of just 64% - nowhere near precise enough for preserving family history.
Google Vision output of the above entry:
(confidence values are displayed next to each word)
66 97 Juve 26 Amen 64 . 40 Sulfer 33 th 40 Saturday 94 , 89 Long 83 , 40 long 50 after 62 the 99 premt 85 generation 82 has 74 gone 80 . 51 the 96 great 97 was 84 In 80 which 97 the 99 vations 81 are 95 now 99 locked 84 , 77 has 62 posssed 60 , 76 June 80 6 87 , 80 1944 95 will 83 be 91 a 93 great 93 historic 83 day 97 . 79 The 98 day
GPT-4 reconstruction of Google Vision output:
Jun. 6: Long after the present generation has gone, the great war in which the nations are now locked will be remembered. June 6, 1944 will be a great historic day. The day
Accuracy
(measured with Copyleaks)
The Solution
I noticed that sometimes Google Vision would mix words from different lines. But what if I could make the AI focus on one line at a time? With this in mind, I turned to CRAFT AI. By identifying and isolating individual lines of text, I hoped to present clearer data for Google Vision to interpret.
First, I used CRAFT AI to identify each word:
Using some python algorithms, I could use the CRAFT boxes to identify the boundaries of each line
Then I could make a new image from each line, starting with the first line:
Then the second line:
And third (and so on...)
Results
The proof was in the transcription. A comparison of the AI's results using the entire page versus individual lines showed a remarkable improvement. From a mediocre 64%, accuracy soared to a commendable 86% !!
Conclusion
The journey from an age-old journal to a digitized archive was challenging but incredibly rewarding. With every stumbling block, I found ways to innovate and move forward. The end product not only preserves my family's history but makes it more accessible than ever.
As for the tools and techniques I discovered and employed, they're not just about this project. They're about the resilience and resourcefulness of self-taught developers. Whether it's a familial treasure or a personal project, with determination and the right resources, anything is possible.
View the full source code.
Checkout the website in action.