I thought this was pretty cool. Remember, in 2007 Google announced it was launching an open source OCR project based on the Tesseract Code, which was developed by HP in the late 1980s and early 1990s. At AIIM that year, we interviewed document capture/OCR expert Chris Riley on what he thought would be the effects of this initiative on the OCR industry.
In our April 20, 2007 issue, Riley commented, ““The real threat to the commercial OCR market could come from independent developers who decide to take the engine and run with it. The technology’s true power could be unleashed when it is set into motion for a niche type of processing, and fine-tuned to do it well.”
For more than three years, we didn’t hear a whole lot about people leveraging open source OCR. However, currently we are working on a story on a company called Copanion that has leveraged the Tesseract OCR technology to create a niche SaaS application for capturing data from tax forms. Based on the number of forms they processed, we’re estimating their run rate for the 2010 tax season was around $3 million and they are expecting to surpass $10 million for the 2011 tax season.
Granted, they use a lot of their own proprietary algorithms on top of the Tesseract OCR, but it’s kind of cool what they are accomplishing. For more, check out this week’s premium issue of DIR.