Botminds AI Solution - Handwritten Forms Extraction - Raveendran Jepegnanam, RRD Go Creative
04 December 2022Automation of handwritten text extraction using the Botminds AI platform – a detailed solution showcase by Raveendran Jepegnanam, Vice President, Business Performance Improvement, RRD Go Creative; as part of his presentation on ‘Expanding Opportunities using AI’ and the RRD-Botminds partnership. This illuminating presentation was at Fusion 2022 – A Botminds Partnership Conference on all things IDP and Intelligent Automation, featuring our automation Partners.
Key takeaways from the presentation -
- Automated extraction of data from handwritten documents that are usually done with human capital and hence, error-prone
- High accuracy achieved by using training samples of only ~50 documents
- This is an area of high importance and impact – extraction from a variety of handwritten forms – 401(k) forms - with a mix of formats, handwritings and elements like checkboxes, mix of handwritten and printed text etc.
Transcript:
Note: This transcript has been edited for clarity.
This is the use case for extracting the handwritten document from a 401K document. This is a large retirement service provider in the US who has been our client for many years, and we have been supporting them in various back-office processes. But I think what has come to our notice is that, this was one of the major blockers because there were a lot of documents where they were handed and texts that need to be extracted. So, till now, we were only using the human capital that we had to extract that information. And even then, there is a lot of variation in terms of errors that are prone due to handwriting and all those.
So that is when we came across this form and we have been working closely with Botminds, so we have looked at this, and said that this is one of the areas which will clearly create an impact for our client. We identified a couple of forms which had almost 55% of the volume, which is about 18,000 pages, started to have to be done in a month. We sort of looked at tagging a few samples, the central theme that we always see in this conference is the speed at which we can do for a handwritten document, the earlier solutions that we were trying needed multiple thousands of documents to be tagged. But, we started off by tagging about just about 50 documents and created models. And then, based on that, we were able to achieve a very high accuracy within just 40-50 documents back to these cases.
So, we have a couple of iterations that we have done, and we were able to come up with almost 85% accuracy and a few documents were almost 100% accuracy – I'll show you how the actual platform works. (Opens the Botminds AI portal on his desktop). So, this is the 401K document. We go to the Studio (a section of the Botminds AI Platform that provides capabilities for document extraction and classification), we have the document that we want to extract from. So, you have confirmation that the document has been uploaded successfully, once that is done, we go to the automation and probably the source information; and rerun the automation.
Now, what happens is that once the crawling has been initiated, we will see that the documents have been crawled while it is coming up here. So, what we can also do is to go to an earlier sample of the document that is ingested. And we can look at that document with this. This is the sample that is being correctly predicted. So, we were able to extract all the information, with the ability to click and find out what it is. So, we were able to accurately extract the information from all these documents. What this really meant for us, as a team, we were able to train with almost just about 70 samples that we had. And for about 40 to 45 of those, we were able to reach field level accuracy of almost 88% out of which 19 documents were above 90% in terms of accuracy. As next steps after the implementation, we are further training the model based on the feedback that is coming through with the Human-in-the-loop (HITL) and we are going to scale this for the other documents as well.