Can your OCR software do handwriting?
What a question. Sometimes we need to correct expectations on handwriting. I have spent a lot of time on the analysis of handwriting and what we can and can’t do. I am going to share some myth busting with you in this article and maybe see how the misconceptions come about with OCR/ICR and handwriting.
My phone can do it… actually so can my tablet and Surface Pro. Why can't you do it? It must be easy surely.
Well. Let’s stop for a moment and think about handwriting and how it's made up. We all have unique features of our handwriting that either are easy to read or not to the naked eye. A Doctor's handwriting for example, is definitely hard to read. My handwriting I am told is easy to read. I am going to take a moment and go around my office to get some handwriting for us to look at. Hang on..
OK- I asked three of my devs to write this sentence.
The rain in Spain stays mainly on the plain.
This is Cat’s handwriting. I ran this through a bog standard Tesseract read and got:
1thiin saju inhh978
I78FYni90lu8hj 9!o
Not good. I asked her to write the same sentence on her surface pro using the stylus, and the same with the mouse. Both got the same result.
The rain in Spain stays mainly on the plain.
Here are two more examples.
Sophie’s handwriting
Ryan’s handwriting.
I asked both of my other developers to do the same using different OCR/ICR tools and different tablets. Same result. Rubbish from the OCR read, perfection from the tablet. So when a client asks me if our Smart Reader technology can read handwriting, what should I say? Other readers will let you ‘train’ it for handwriting but that isn’t much use unless you can get samples, or wait for enough samples. That really isn’t good enough.
Without setting expectations too high, I had an idea.
Why does it read from the tablet?
The reason is that not only is it capturing the finished sentence, but, it is also capturing the strokes and gestures that were made during the writing of the sentence. That’s a hard one for non techies to follow. In fact try explaining that to people at an executive level.
Below is a diagram that explains strokes, typically I use this during presentations.
The above is the breakdown of single straight strokes only. Given that now we have got the straight line stroke AND the finished word, we can probably get the word itself.
However, if we also add in the curved strokes as well, and add all that information into an array of strokes, plus the finished word then we will easily be able to identify the letter, and pass the letters that make up a word through a machine learning model to correct the word if it is wrong. If we also know the context then we are away.
So to answer the question in a nutshell, that’s how your phone and tablet convert writing into digital text. Before the developers amongst you start arguing with me - it does way more than that - which is true, but for now let's think of it as a collection of pixels, strokes and gestures.
OCR/ICR
Right now the term A.I. is massive in most industries. I think most know my feelings about it, I think Alan Turing would be turning in his grave considering what some companies are calling A.I.
However, there is an element of intelligence in the above. How about when we just have a scan of handwriting. How will we read that given we do not have that stroke and gesture information?
The answer is, it is very tricky. And impossible to get right 100% of the time. Let's think about what we are getting. We are getting an array of pixels. Little on/off switches that make up a picture. These can come in a variety of different formats that we are all familiar with. JPG,BMP, PNG etc.
Firstly how do we know it is handwriting? I am working on a project right now that is a mixture of handwriting and plain text. So a scan can come in essentially with handwritten notes on it.
The idea is to OCR the text and then ICR the handwriting. That is a huge problem.
1. How do we know where the handwriting may be
2. How do we know if it is handwriting
3. How do we know if the handwriting is in a straight line
Those are SOME of the questions, but as you can imagine, there are a lot more! The answer is very simple. You cannot blanket read a document for handwriting.
As I always tell our developers - divide and conquer.
Let me repeat - you cannot blanket read a document for handwriting. However, like all programming problems you can do things based on what you know and find a solution.
Context is so important. In fact that is half the battle. Assuming you know the industry you are reading documents for, then you can eliminate a lot of problems.
If you know the place you are looking for the handwriting then you are away. There are however, other tricks you can employ to find handwriting. But that is for another discussion.
Let’s say you know where it is and you have snipped it programmatically out of your document. We are still missing that lovely gesture and stroke information, so it isn’t as easy to solve, just yet.
We must think about the handwriting as that array of pixels. I described it recently to one of our junior devs as a microscope zooming in and out.
In is
[0,1,0,0,0,1,1,1,…N]
Out is
Or the picture.
Without giving away all my secrets, we have devised something I believe is unique. An intelligent way of tracing the 1,0 in the same way a person would. Thus we can capture what we think of as the gestures and strokes people would use.
Once this is applied we can then begin to try to understand a bit better what the handwriting says.
We here at Sonix have begun to create a gesture and stroke library to mimic the handwriting capture in a similar way to a tablet. We will keep you posted!
Click the robot to visit us!
Lead Developer at Sonix Software
5 年How's about I send you some samples to run through?
VC/PE-Backed CEO | AI/ML Innovator | Revenue Growth | Turnaround Architect | Tech-Driven M&A Strategist
5 年100M pages processed, validated by independent 3rd party. Send me a direct message I'll give you a tour.
VC/PE-Backed CEO | AI/ML Innovator | Revenue Growth | Turnaround Architect | Tech-Driven M&A Strategist
5 年Dan, we've cracked a lot of this for forms processing. Did 100M page project with 97% accuracy. Vidado.ai