Around 80% of all incoming business information is said to be unstructured – much of it is in human readable form, but not machine understandable. Images and other ‘analog’ information such as voice fit into this category. If this information is not translated into ascii machine readable data, it is invisible to AI and analytics algorithms. The result is that decisions, which a machine learning system reaches – often in ways that we cannot easily decipher – may be very wrong.
Adding to the problem is incorrect information. I wrote an earlier blog on OCR (Optical Character Recognition) accuracy. The system only knows what it knows and if a character, number, or words are incorrect and fed into the system without verification, the conclusions and trends may be invalid. As we implement voice from CEM systems into this, the odds of incorrect information rise.
Verification and purification of data are often overlooked in the interests of speed and cost reduction. Vendors add to the problem by making extravagant claims regarding accuracy of conversion. Humans are only about 98% accurate. Do we really believe an OCR vendor who claims over 99% accuracy on handwriting? Those who make these sorts of claims without double entry and validations should be treated with appropriate skepticism.
I am concerned that a lot of AI vendors and users seem unworried. Maybe we need a disaster? But in the meantime, the companies who will win are those with the most accurate information – preferably available more quickly than the competition. Those who prioritize speed over accuracy may be setting themselves up for disappointment or worse.