Coming out of AIIM last year I had come up with a vision on the potential future of the document imaging industry. I’ve repeated the mantra several times since – and it goes like this: “Capture it all and let the technology sort it out.”
In fact, I recently completed a piece for Quality Associates’ upcoming Insights newsletter detailing what I see as some of the driving forces behind this vision. They include trends like increased multi-channel capture and increasing intelligence in capture driven by emerging technology like natural language processing.
This year I attended the Ephesoft Innovate conference prior to heading down to San Diego for AIIM 2015. At Innovate, Ephesoft founder and CTO Ike Kavas presented on his vision for the future – which I thought dovetailed nicely with mine. Kavas and his team at Ephesoft have even gone so far as to developing a brand new product – Ephesoft Universe – designed to enable organizations to mine their documents.
Due for release later this year, Universe leverages Big Data tools like Hadoop. According to Kavas, Universe is able to leverage 16 different characteristics to classify a document and recognize a field. Ephesoft is developing machine learning algorithms to consider these characteristics. The bottom line is that this is a lot of data being put through a process that requires a lot of computing power – hence the need for the Big Data tools, especially if a user is throwing a high-volume of documents at it.
The end game for Universe is trying to reduce the time it takes to implement a classification and extraction application from months to minutes. Also, the idea is to enable individual users (not system admins) to set up personalized auto-classification and extraction applications.
Kavas was brave enough to show a demo of Universe, which he expects to be released, in Version 1.0, later this spring. Basically, a user creates their own document classes, feeds it examples, and chooses and labels which fields it wants to extract based on the highlighted fields that Universe was able to recognize. Once the data is extracted, it is fed into an analytics application that is also built into Universe. An example Kavas showed utilized hot/cold zone graphing to show the average price of houses in different states in the country.
Other potential application ideas tossed about included mining medical records for various reasons including enforcing records retention policies, mining expense reports to enable more informed negotiations with vendors, and examining financial documents for at-risk loans or security risks.
There is a lot here, and I’ll have more detail in my next premium issue of DIR. Ephesoft’s current goal is to find some customers and partners to help it determine what needs to be done next on the road to productizing Universe. But, there is clearly a lot of potential, mainly because it offers to make accessible what has historically been very high-end technology, whose adoption has been slowed somewhat I feel by paralysis by overanalysis. If Ephesoft can really make Universe a universal tool, I think we’ll start to see a slew of new IDR applications developed on top of it.