Working with ChatGPT oi preview: Error recognition and classification in raw machine transcriptions of C17th legal depositions
Overview
MarineLives is running a project to machine transcribe and machine/human correct 50,000 pages of legal depositions from the C17th. That's roughly 30 million words. Our goal is to complete a high quality machine transcription by the end of 2025, which is pretty ambitious!
We are using a bespoke machine transcription model run on the Transkribus platform, on which we also have a demo of model output. The model is trained on roughly 400,000 tokens.
We are exploring the use of LLMs to automate the cleanup of the raw machine transcriptions, and over the last two days we have had a chance to road test ChatGPT oi preview and to compare it with a tailored version of ChatGPT which we have developed named Aurelius-HTR. oi preview wins hands down.
You can read about our work with small scale and large scale LLMs on our Hugging Face site.
Exploring ChatGPT oi preview
Prompt one
I set the scene with two lines at the start of my first prompt and stated the task
You are an excellent pattern recogniser, systems analyst, and coder. You are helping me design a process for ChatGPTo1 to improve the quality of raw-HTR output. Please analyse these data and identify different types of HTR error which these python snippets correct. Here are the snippets:
The Python snippets I provided contain a range of machine transcription errors, in no particular order. The first part of each snippet contains an error; the second part contains the corrected text. I gave no guidance on error recognition or classification beyond the prompt and the data.
r'\bamonighte them the goodes articulate\b': 'amongste them the goodes articulate',
r'\bAnd th hee saith is true\b': 'And this hee saith is true',
r'\bby tresse of weather\b': 'by stresse of weather',
r'\bof the befence of yt\b': 'of the defence of yt',
r'\brighte worshipfull hard Zouch\b': 'righte worshipfull Richhard Zouch',
r'\bsaieth and depose as afolloweth\b': 'saieth and deposeth as followeth',
r'\bsawe the said shippe comaged and searched\b': 'sawe the said shippe romaged and searched', #'romaged' or 'rumaged'
r'\bsawe the comageinge of the said shippe after all her goodes were take out of her\b': 'sawe the romageinge of the said shippe after all her goodes were take out of her', #'romageing' or 'rumageinge'
r'\bthe articulate shippe the Mary ffse\b': 'the articulate shippe the Mary Rose',
r'\bthe barrells of Anthones and Capers\b': 'the barrells of Anchovies and Capers',
r'\bthe rese of the sd goodes\b': 'the reste of the said goodes',
r'\bthe take of the Blessinge\b': 'the takeinge of the Blessinge',
r'\bthe usuall and ordinant fraighte prices \b': 'the usuall and ordinary fraighte prices ',
r'\bthe yeare of on bord 1636 or 1637\b': 'the yeare of our Lord 1636 or 1637',
r'\bthis examats beefe remembrance \b': 'this examinats beste remembrance ',
r'\bto Mashforde\b': 'to Washforde',
r'\bto Mashforde in Ireland\b': 'to Washforde in Ireland',
r'\bto Pishforde\b': 'to Washforde',
r'\bviselicet\b': 'videlicet',
r'\bto Wishford in Ireland\b': 'to Washford in Ireland',
oi preview took thirty-seven seconds of "think time" in which it went through the following process:
Prompt two
The results oi preview then shared were pretty good, but I wanted to see if we could do better. So I constructed a second prompt, this time containing an approach to error classification which I had developed with oi preview the previous day.
Here is the start of my second prompt in my ongoing conversation with oi preview:
领英推荐
Good but not perfect. You can do better. I am going to give you a framework to categorize errors. Please incorporate this framework into your thinking Modify it if appropriate Then reanalyze the 20 Python snippets I gave you earlier. Here is the framework to categorize errors
The framework I shared within the second prompt contained five sections, each section containing sub-types of errors together with examples. oi preview very helpfully summarised the framework in its response to this second prompt, allocating each section a letter, then used those letters to classify errors.
This time oi preview took fifty-nine second of "think time", integrating the error recognition framework and reanalyzing the python snippets within that framework.
The output from this second prompt was highly structured and analytical. oi preview walked through each of the snippets listing the original text, the corrected text, isolating and analyzing the error and assigning an error type. The assignment of the error type was the smart bit, and is an approach which is highly generalisable as we build our cumulative HTR error/correction. We plan to upload a dataset of 10,000 HTR error/correction pairs to our Hugging Face site when complete,
Here is an extract of this part of oi preview's response for the first two of the twenty python snippets:
oi preview then provided a summary of error patterns, which was thought provoking and provides the basis for scaling up and expanding our analysis of HTR errors. The analysis is ot perfect. For example, the single letter insertion examples are actually double letter insertons. But, when provided with much larger datasets, or fine-tuned with such larger datasets, I would expect a high degree of accuracy.
Finally, in it's respose to prompt two, oi preview provided some general observations on error patterns, which were insightful, and made some process improvement suggestions, which were plausible and merit working on.
For those of you unfamiliar with C17th legal depositions, this is what they look like
Example of raw transcription
Here is an example of one page of raw transcription from a volume of depositions from the 1642 to 1644 period. The page is HCA 13/58 f.525v. The text is a so-called full diplomatic transcription, and preserves abbreviations and contractions. It is human readable, but not perfect, and needs cleanup, and needs the shortforms to be expanded to assist readability.