Enterprise AI does not require big data but smart data
A few samples of smart data builds all the context, correlations and patterns that you need
An article in the Economic Times today, boldly contradicts the popular view - quoting Vinod Khosla, the silicon valley veteran- and announces that next-gen AI systems won't need huge amounts of data. Vinod Khosla has argued in his conversation that data is being overvalued, and in the long term far less data will be needed for AI. He adds - smarter the data, the less data AI needs.
I share this view, but of-course my word carries far less weight than that of a veteran entrepreneur. Every book that you read on machine learning, every interview that you come across on AI, every lecture that describes the AI algorithms talks of the underlying foundation that requires big data. Over time, data has grown from being the just the new oil of the digital revolution into being almost revered, its aura growing into a halo…
I am intrigued and have always believed that data is being overvalued - its almost as if this hype has been deliberately built by the big four (Google, Facebook, Amazon, Apple) and extended upon by their followers (Uber, Airbnb etc), who have taken an early lead in building data sets and are trying to monetise it by building up its value… and we are falling for the deceptively high valuations and getting distracted by tangential issues around monopoly, privacy etc. which simply fuels the illusion further…
While we cannot deny that their domination within their own industries has been largely due to the competitive edge they have established through sophisticated data collection and analysis practices, it is worth keeping in mind most of their AI algorithms are driven by probabilities, which inherently requires large data sets and even then delivers at best a statistical projection. Further, these are all consumer companies… they cater to a diverse population that has varied needs and requirements. Patterns can vary differently - sometimes on just one variable and at times based on a complex combination of interrelated variables. Such companies are dealing with millions of customers, unknown patterns, changing trends and have no option but to go through large volumes of data and then try and make sense of it all. No wonder they need more data.
But, an enterprise starts with a distinct advantage. It has focus. It has specific business drivers and business processes. And above all, it has relevant data. It does not need to guess what is needed, but in most scenarios knows. If a fault occurs in a system, the enterprise understands the exact cause, effect, solutions and can even define preventive measures. If a client reports a problem, there is usually enough technical and historical experience to map it to a set of 3-5 probable causes - i.e. again the enterprise is not searching for a needle in a haystack, but a probable cause from a small set of shortlisted options and can immediately provide a prescriptive solution. If machine failure is to be predicted, the enterprise is already aware of all the variables that need to be measured and tracked and understands the constraints, dynamics and conditions fairly well. A limited set of such data will be sufficient to predict performance, failure probability or recommend configuration optimisation. The scope of the problem is bounded and decision making in an enterprise is more informed than just probabilistic.
This view is not based on conjecture or hypothesis, but has evolved through hands-on practical application. I have been working with enterprise data for the last five years - and in many cases - have manually labelled, categorised, interpreted and analysed data for many different workflows. My efforts, of-course, have been primarily to clean the data being captured, build visualisations to understand the data and to construct categorisations, design statistical processes and train and validate the machine learning algorithms that we are embedding into enterprise workflows. The experience has been enlightening. More and more, I have seen that when data is collected within the context of a workflow (and the context is recorded and saved along with the data), it can be directly mapped to concepts and combined with logic and reasoning, to generate fairly accurate correlations. It moves beyond probabilities to directly applicable process abstractions - and this can be achieved through contextual AI models that are trained using a smaller set of relevant data.
I have seen successful interpretation (with as few as a few hundreds of samples - sometimes just from double digit numbers) across many enterprise workflows such as corrective maintenance, preventive maintenance, field service, health & safety, asset survey, meeting updates etc.
It is my belief that enterprise AI will be far less about bottom-up big data, but will rely more on top-down reasoning, logic, and application of concepts to problem solving. It is not just about requiring less data, but also being more flexible and predictable. It is true that the industry (including us) is focusing on data hungry neural networks and deep learning algorithms trained on mountains of data - but these are time consuming to design and validate, and often end up with many limitations, are unable to handle edge cases, i.e. situations where data does not exist. But, what scares me the most is that these algorithms are a black box and it is not always possible to trace the steps that led to a certain action or decision when something goes wrong. This adds a big unknown in the otherwise deterministic enterprise environment.
That said, it is also a given reality that a true top-down general intelligence system is probably more a vision today and still many decades away... and that is where smart data fills the gap today - cutting down the volume of data needed, mapping directly to the area of application, providing the correlation with logic and reasoning- before it is fed into contextual neural networks and deep learning algorithms.
As algorithms and data get more contextual and smarter, less data makes AI work- especially in the enterprise.
True.. next - smarter data..? then data which sanitizes itself..? :)
Always exploring what I want to be when I grow up! Meanwhile, I'm a jack-of-all in the tech world - product, customer success, business development
5 年Very thought-provoking and relevant, Arti - makes a lot of sense that enterprises have context that allows them to do more with less data!