In late November, at its AWS re:Invent conference, Amazon Web Services announced a number of new services based on machine learning. One of those was Amazon Textract, which some pundits are saying is going to kill the OCR industry. According to Amazon, Textract is a service designed to “extract text and data from virtually any document.” This includes table and forms extraction and the ability to capture text in context. Textract also offers features like confidence level feedback. And the pricing is great. The service starts at .00015 cents per page for basic OCR, or $1.50 for 1,000 pages.
The technology being utilized seems to be a combination of an internally developed OCR engine and machine learning. Here’s what Amazon has to say about it, “Amazon Textract is based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images and videos daily…. includes simple, easy-to-use APIs that can analyze image files and PDF formatted files. Amazon Textract is always learning from new data, and we’re continually adding new features to the service.”
Basically, it sounds like something similar to what we’ve discussed with Captricity, an ISV which has utilized crowdsourced data entry and machine learning to build an automated data capture engine [see DIR 9/7/18]. Some other vendors in our market are using a similar approach to capture, although many are building their machine learning algorithms on top of existing OCR engines and leveraging traditional character recognition in the equation. Regardless, I think it’s safe to say, after 30-plus years of working with traditional OCR, we have now entered the uncharted waters of OCR 2.0 and Amazon is attempting to establish itself as one of the pioneers in this space. Google also seems to be treading into the water with its Cloud Vision technology, although I haven’t had a chance to fully explore that yet.
Various elements of Textract are being made available through a series of APIs. Initially, there is a Detect Document Text API to access OCR functionality and two different Analyze Document APIs, one for capturing tables and one for forms processing. Licensing the table and forms processing APIs costs extra. The table mode is essentially $15 per 1,000 pages, with the forms mode starting at $65 per 1,000 pages. There are discounts when you get over one million pages processed per month. Textract is currently available in a “limited preview,” being hosted in a handful of sites designed to serve various U.S. regions.
The use cases that Amazon has presented are pretty standard document capture industry applications like tax forms and mortgage applications. On a more horizontal level, Amazon presented search, compliance, and process automation as areas of application. It was interesting the way Amazon explained current capture technology available in the market. For example, OCR was described as not being able to handle columns or tables and data extraction and forms capture was described as utilizing “complex rules and template-based extraction” and primarily able to handle “structured information at scale.” While there is some truth to this, Amazon’s market analysis seemed a bit dated based on what we’ve seen in recent years.
Industry impact
After seeing headlines like “Amazon Textract has just killed the OCR industry. Who’s next and who’s safe?” we decided to do a sanity check and talk to some experts who might have a deeper understanding of Textract’s effect on the market than some crackpot headline writer. We tried to connect with the OCR SDK vendors, who most people think could potentially be affected most by Textract, as well as capture vendors who work on the AWS platform.
Of the SDK vendors, so far only ABBYY has responded. Bruce Orcutt, ABBYY’s VP of product marketing, offered a fairly forceful position on why Textract is not really a threat to traditional OCR SDKs. Following is the text of a fairly detailed e-mail ABBYY sent to DIR (with some editing related to flow):
“Having Amazon enter the space is encouraging and quite exciting as it gives more visibility and credibility to problems that our customers are trying to address with their digital transformation strategies. [But] it appears that Amazon did not really evaluate the capabilities of modern commercial OCR technologies—as many of the challenges or problems they associate with OCR have been solved for many years quite successfully by commercially available products. I would speculate much of their assumptions about OCR were drawn from evaluating open source products and not commercially available engines.
“[For example,] they make some interesting assertions or claims about OCR systems being template-based, which is not true or accurate. While many structured forms historically have been processed by templates, over the past 10 years, companies have leveraged machine learning and trainable systems to learn and understand structure and content and how to accurately extract information without the need for templates. In fact, Amazon fell short of a viable answer for the most common document type processed by OCR technologies today, which is invoices. When pressed in their webinar about processing invoices, the response was not very confident or clear to the audience that invoices were a viable use case to be addressed at this time.
“[In addition], during the webinar where Amazon presented Textract, they were unclear about processing tables that span multiple pages of a document. If this is true, Amazon is significantly behind in their detection, understanding and processing of table data. I cannot imagine a production customer accepting tables to be read page by page. We have to process tables that span multiple, if not hundreds, of pages.”
Orcutt goes on to say that Amazon is basically presenting use cases that have already been solved. I agree with this, but, it’s probably worth pointing out that Amazon is offering its technology at a fraction of the cost of traditional OCR and capture technology.
Orcutt said he also got the impression that Amazon was at least somewhat positioning Textract against not just SDKs, but full-blown capture solutions. “This is nonsense,” he said. “These applications were developed at the cost of hundreds of man years of development. This is what creates real value for customers who are looking for capture solutions.”
Orcutt then cuts to what may be the heart of the matter, saying, “OCR is not a core business for Amazon. Customers will have little influence on the roadmap, and at any time, other priorities will prevail. [In contrast], ABBYY is committed to developing OCR, it has been our business for more than 20 years. That means we address the needs of our customers as timely and best we can, and it is the reason we support many languages and platforms (cloud, Windows, Linux).”
To us, this point is reminiscent of Microsoft’s approach to SharePoint as an ECM platform. Multiple times over the years, there have been alarms sent out that SharePoint was going to kill the legacy ECM market. And, while SharePoint has had some success, quite frankly, it often has trouble competing against platforms from more focused ECM vendors who are constantly updating and improving their software. By contrast, Microsoft might come up with a big ECM refresh every five years [I hope Mike Alsup is not reading this], which is when we send those flares out.
Orcutt concluded that Amazon is not the first new entrant into the capture market in recent years throwing around fancy terminology. “Over the past two years, we have seen a number of start-ups emerge that claim to leverage ‘AI’ and ‘machine learning’ to solve all problems: invoices, mortgages, claims, EOBs, etc. These startups, while showing great marketing promise, have struggled with the real world challenges that exist outside their labs and test beds. Many times, these vendors have failed to understand the breadth of the problem that is being presented by users and customers. Since Amazon is offering Textract as a service, I guess customers can try to resolve its shortcomings with additional plug-ins or technologies, but this adds a lot of cost related to development, support, and customizations when production capture companies have been solving this for years.
“As innovators in OCR, ABBYY has been adding deep learning technologies to its technology portfolio. Our products combine the best of two worlds: machine and deep learning. In addition, our on-premises SDKs and platforms guarantee customers that their documents will stay on-premises.”
Capture ISVs intrigued
Most capture solution vendors do not see Textract as a threat, but rather something they could potentially leverage in their offerings. “I think Textract is going to compete primarily with companies offering technology rather than solutions,” said Dan Dubiner, CTO of ScaleHub. “I think it could conceivably commoditize the OCR SDK business.”
ScaleHub is in an interesting position because it leverages Amazon Mechanical Turk in its crowdsourcing data entry/completion/ verification service. In fact, later this month, at the IAOP OWS19 event in Orlando (Feb. 17-20), ScaleHub and Amazon will be presenting together on “Growing Your Business and Meeting Dynamic SLAs by Leveraging Crowdsourcing Technology.”
ScaleHub also leverages third-party OCR and capture products for automated data entry and licenses its crowdsourcing services to capture solutions providers. Dubiner pointed out that the entry of Amazon and Google into the capture market should, at the very least, spice things up a bit. “Basically, you have two giants entering into this competitive landscape and they have a lot of funding and computing power to invest,” he said. “If you are using a legacy OCR SDK and paying a few cents per page, and now you have a worldwide player that is offering the same services with sufficient quality for a fraction of the cost, why wouldn’t you consider it?
“At ScaleHub, we embrace new technology. From our standpoint, we believe these offerings from Amazon and Google will encourage more entities and enterprises to move to cloud capture processes and potentially bring more business our way, as a complementary managed services provider. These new products have the potential to drive the costs of capture down and the efficiency up. I think we will start to see a transition in the market, especially among start-ups entering into capture services.”
Innovation is Differentiator
Ephesoft, which develops capture solutions and leverages AWS in its cloud offerings, also sees the introduction of Textract as a potential benefit rather than a threat. “Amazon’s Textract service is good news for some and bad news for others,” noted Kevin Goulet, VP of product management for Ephesoft. “It’s good news for companies like us that are innovating and trying to solve problems in different ways than what the ‘dinosaurs’ of our industry have been doing for years—same old same old.
“Ephesoft, a patented and secure and full stack on-prem and cloud capture platform designed for enterprise customers, is fundamentally different than Textract, which simply offers a consumer-grade OCR use case. Textract is a web service. A customer would have to write and interpret code to use it—specifically, the customer would have to cobble together all the other elements of a capture platform: import sources, image processing, classification/separation, validation/exceptions processing and export to even begin to achieve the value Ephesoft offers in the market. In addition, Amazon has stated they will utilize customer data to improve their product which will be a non-starter for most or potentially all enterprise customers.”
Goulet added that he views Amazon’s introduction of Textract as another indicator of an industry trend toward cloud services. “The rate at which enterprises are moving resources and data into the cloud continues to accelerate,” he said. “If an enterprise isn’t moving their data and systems to the cloud, they will be at a competitive disadvantage. This is a trend that favors innovators.”
Captricity now Vidado
The final capture solution provider we checked in with on Textract was Captricity, which actually just changed its name to Vidado. According to Nowell Outlaw, who took over from founder Kuang Chen as CEO of the ISV in July, Vidado means ‘vision’ in Esperanto. “The rebranding is to help us transition how people think about what they know about us,” Outlaw told DIR. “We really are an AI company—a lot of people had associated us with a pure capture style of vendor, e.g., we got calls about needing scanners, etc.”
Outlaw looks at the launch of Textract as validation that Vidado, whose technology is hosted on AWS, has been going down the right path. “I know the team at Amazon that built Textract ; it’s great,” he said. “It helps re-emphasize how digitization from paper can be done with machine learning vs. traditional OCR methods and how things can be improved across the board.”
He did not indicate that he feels that Textract is competitive with Vidado, which may have something to do with the company’s recent success. According to a press release, the name change “builds on the company’s strongest quarter in history (Q4 2018) and strongest year overall in terms of financial performance and customer acquisition.” Outlaw added, “We are doing a project right now requiring us to process 100 million pages, going back almost 100 years—it’s amazing to see what an AI/machine learning approach can really accomplish.”
The pressure is on
Before we get too far ahead of ourselves ruminating on the potential of Textract, let’s remember that Vidado has been building its technology since it was founded in 2011. At this point, Textract is basically like a well-funded start-up just entering the capture industry. Does it have disruptive potential? Certainly. Is it hard for us to believe that Amazon will maintain focus on a market worth less than $1 billion worldwide? Affirmative, as well. But, even if they do, we do not necessarily see it as the end of the world for legacy OCR engines.
This reminds me somewhat of when Google took over sponsorship of the Tesseract open source OCR code in the mid-2000s and everyone thought that was a death knell for legacy OCR engines. I remember Chris Riley speculating that you could write some validation code to supplement the Tesseract recognition and achieve results that would be good enough for most applications. But, while Tesseract has some adoption in business use cases, it hasn’t been enough to make a noticeable dent in the traditional OCR market.
The cool thing about Textract is that it is not traditional OCR. Rather, it embraces machine learning and moves the ball forward towards this OCR 2.0 concept that seems to be emerging in the market. So, it has a chance to be something new, better, and revolutionary compared to what’s currently on the market. But, the bigger question may be, is OCR 2.0 really going to be better than its predecessor technology? I mean, there have been millions of man years invested, by a lot of very smart people, in developing our legacy OCR engines. Shouldn’t we just be building on top of these?
Of course, maybe they just aren’t going to be efficient in new computing paradigms, such as cloud and mobile, going forward. If anyone is in a position to make a bet on the future of technology related to computing platforms, it may be Amazon. I think it’s going to be important to keep an open mind on all this and continue to study this new recognition technology as it comes down the pike, while at the same time not discounting the legacy stuff, especially as the legacy vendors continue to improve on it.
Ephesoft’s Goulet may have put it best when he stated, “Textract is bad news if you are not innovating. If you have not innovated and that’s not in your culture, Amazon and Ephesoft [et al] will eat your lunch sooner than later.”
For more information: https://aws.amazon.com/textract/;
http://bit.ly/VidadoCap