–Parascript announces the latest EDAC Systems, Inc.’s solution: ReadSCRIPT® Maestro driven by NITe and powered by Parascript technology–
Parascript announces the latest development in the EDAC Systems, Inc.’s solutions: the new product ReadSCRIPT Maestro driven by NITe technology for locating and separating handwritten words in unstructured fields and Parascript technology. Maestro detects and forwards handwritten email addresses to EDAC’s advanced intelligent word recognition (IWR) product, ReadSCRIPT, for interpretation, which is powered by Parascript technology.
We sat down with the founder of EDAC, Randy Blevins, to find out about how companies and government agencies begin to implement these technologies.
“When approaching a data capture and index extraction challenge, the best place to start is at the very end of the process—the final destination of the images and the index values,” said Randy Blevins. “Understanding the final data requirements is fundamental to determining the most appropriate techniques and solutions to implement.”
APPROACHING THE DATA EXTRACTION CHALLENGE
To fully harness the power of document capture, according to Blevins, every company benefits from following three basic steps:
1. Determine what documents need to be captured. This requires determining the document types that need to be processed and building the pattern recognition rule sets to separate and classify the document types.
2. Define what data needs to be extracted. This means understanding how the data will be used and defining the business rules to extract the necessary data from each one of those document types or forms.
3. Establish where the data will go. Once the data is extracted, the metadata and index values may be assigned for review and validation, approval, and/or be routed directly into an ERP, CRM, Content Management or other backend systems.
EXTENUATING CIRCUMSTANCES: FOLLOWING A DIFFERENT PATH
Different document types may take different paths through the document processing/EDAC capture solution. If a document is computer generated and “clean,” it may flow through a standard document processing workflow without interruption. However, extenuating circumstances, such as image quality, low resolution and/or handwritten text may require workflow detours.
“If the image is noisy, skewed, or unevenly shaded, the workflow automatically directs the image to go through EDAC’s PurePAGE for image processing and increased image resolution. For handwriting recognition and extraction, the image will be routed to EDAC’s ReadSCRIPT that uses Parascript technology,” Blevins explained.
WHAT HAPPENS TO CUSTOMERS’ DOCUMENTS DURING AUTOMATION?
Documents arrive in a myriad of input formats such as paper, faxes, electronic via email, or other electronic documents. These images are scanned or imported into the system, which then functions as the “traffic cop,” the triage unit and the swat team—all in one.
Regardless of the input format, the system separates, recognizes and classifies each document by its type. The second phase in the workflow after input is image optimization. Sometimes, the image needs to be scrubbed and re-sized by EDAC’s PurePAGE, which automatically detects the quality and the resolution of the image.
If the image quality is poor, then PurePAGE enhances the page by numerous methods, such as deskew, despeckle, de-shade, auto-invert, black border removal and other enhancement techniques. With low resolution images, such as incoming faxes, PurePAGE automatically determines the resolution of the document, and adjusts the resolution to 300 DPI. In addition, letter and character filling and smoothing are completed to enhance the quality of the characters, which is important when dealing with handwriting.
The document then continues through the workflow process, and is classified by document type. During the requirements gathering phase, the customer has identified what fields on each document type should be extracted. Business rules within the system are configured to locate and extract those fields or individual entries. For example, a document is recognized as an invoice, and the customer has decided to extract all of the line item entries on the invoice. Specific rules are built and applied to capture the data from the given fields. The system executes the rules and extracts that data off of those fields. Once this is completed, the recognition rates are assessed. If the recognition is poor or the data is handwritten, then those forms are passed to ReadSCRIPT.
IMPROVING ACCURACY DURING CAPTURE
Multiple recognition engines are used for optical character, intelligent character, optical mark, and barcode recognition. One or combinations of these engines may be used to increase the accuracy of the document recognition and index extraction. In addition, ReadSCRIPT performs intelligent word and phrase recognition for improved accuracy. ReadSCRIPT uses over 25 dictionaries, which are automatically employed to validate the fields against entries in those dictionaries. The customer’s own dictionaries can be added to the validation step, and this further improves the validity of the results. If the system is not confident about the document type or the extracted data, then it goes into a verification queue where it can be manually reviewed and corrected.
“ReadSCRIPT offers learning technology,” added Blevins. “If the customer has documents coming in from the same individual on a regular basis, ReadSCRIPT learns the way the words are handwritten. For example, if Sally is sending in purchase orders from a specific company, then ReadSCRIPT learns her handwriting patterns and the products that she orders regularly. This learning significantly improves accuracy.”
When the document or specific field(s) are written in cursive and are part of unstructured documents or fields, then ReadSCRIPT Maestro becomes part of the solution in order to locate and segment the individual words and lines. Maestro finds and separates the individual words, and passes snippets back to Parascript technology for word recognition.
The final step in the workflow is the export. With the verification and validations steps completed, the document and its data are exported to the customer’s preferred backend system, such as IBM FileNet, an Oracle database, an ERP or CRM.
“This sounds like a complex process,” said Blevins, “but since it’s automated, it’s extremely rapid and volumes of documents are processed almost instantly.”
RETURN ON INVESTMENT THROUGH AUTOMATION
Return on Investment (ROI) can only be determined accurately after the project has been completed or at the very least fully functional for a significant period of time, according to Blevins. However, some estimates have been calculated prior to commencing some projects the longest ROI for leveraging these specific technologies has been 9-11 months. For other projects, the ROI is as short as two months.
“We have found that introducing ReadSCRIPT and Parascript to a manual keying project reduces error rates, processes documents faster and can literally save our customers millions of dollars by reducing manual data entry,” Blevins concluded.