Enterprise AI does not require big data but smart data

Arti Khanna

发布日期: 2019年8月5日

A few samples of smart data builds all the context, correlations and patterns that you need

An article in the Economic Times today, boldly contradicts the popular view - quoting Vinod Khosla, the silicon valley veteran- and announces that next-gen AI systems won't need huge amounts of data. Vinod Khosla has argued in his conversation that data is being overvalued, and in the long term far less data will be needed for AI. He adds - smarter the data, the less data AI needs.

I share this view, but of-course my word carries far less weight than that of a veteran entrepreneur. Every book that you read on machine learning, every interview that you come across on AI, every lecture that describes the AI algorithms talks of the underlying foundation that requires big data. Over time, data has grown from being the just the new oil of the digital revolution into being almost revered, its aura growing into a halo…

I am intrigued and have always believed that data is being overvalued - its almost as if this hype has been deliberately built by the big four (Google, Facebook, Amazon, Apple) and extended upon by their followers (Uber, Airbnb etc), who have taken an early lead in building data sets and are trying to monetise it by building up its value… and we are falling for the deceptively high valuations and getting distracted by tangential issues around monopoly, privacy etc. which simply fuels the illusion further…

While we cannot deny that their domination within their own industries has been largely due to the competitive edge they have established through sophisticated data collection and analysis practices, it is worth keeping in mind most of their AI algorithms are driven by probabilities, which inherently requires large data sets and even then delivers at best a statistical projection. Further, these are all consumer companies… they cater to a diverse population that has varied needs and requirements. Patterns can vary differently - sometimes on just one variable and at times based on a complex combination of interrelated variables. Such companies are dealing with millions of customers, unknown patterns, changing trends and have no option but to go through large volumes of data and then try and make sense of it all. No wonder they need more data.

But, an enterprise starts with a distinct advantage. It has focus. It has specific business drivers and business processes. And above all, it has relevant data. It does not need to guess what is needed, but in most scenarios knows. If a fault occurs in a system, the enterprise understands the exact cause, effect, solutions and can even define preventive measures. If a client reports a problem, there is usually enough technical and historical experience to map it to a set of 3-5 probable causes - i.e. again the enterprise is not searching for a needle in a haystack, but a probable cause from a small set of shortlisted options and can immediately provide a prescriptive solution. If machine failure is to be predicted, the enterprise is already aware of all the variables that need to be measured and tracked and understands the constraints, dynamics and conditions fairly well. A limited set of such data will be sufficient to predict performance, failure probability or recommend configuration optimisation. The scope of the problem is bounded and decision making in an enterprise is more informed than just probabilistic.

This view is not based on conjecture or hypothesis, but has evolved through hands-on practical application. I have been working with enterprise data for the last five years - and in many cases - have manually labelled, categorised, interpreted and analysed data for many different workflows. My efforts, of-course, have been primarily to clean the data being captured, build visualisations to understand the data and to construct categorisations, design statistical processes and train and validate the machine learning algorithms that we are embedding into enterprise workflows. The experience has been enlightening. More and more, I have seen that when data is collected within the context of a workflow (and the context is recorded and saved along with the data), it can be directly mapped to concepts and combined with logic and reasoning, to generate fairly accurate correlations. It moves beyond probabilities to directly applicable process abstractions - and this can be achieved through contextual AI models that are trained using a smaller set of relevant data.

I have seen successful interpretation (with as few as a few hundreds of samples - sometimes just from double digit numbers) across many enterprise workflows such as corrective maintenance, preventive maintenance, field service, health & safety, asset survey, meeting updates etc.

It is my belief that enterprise AI will be far less about bottom-up big data, but will rely more on top-down reasoning, logic, and application of concepts to problem solving. It is not just about requiring less data, but also being more flexible and predictable. It is true that the industry (including us) is focusing on data hungry neural networks and deep learning algorithms trained on mountains of data - but these are time consuming to design and validate, and often end up with many limitations, are unable to handle edge cases, i.e. situations where data does not exist. But, what scares me the most is that these algorithms are a black box and it is not always possible to trace the steps that led to a certain action or decision when something goes wrong. This adds a big unknown in the otherwise deterministic enterprise environment.

That said, it is also a given reality that a true top-down general intelligence system is probably more a vision today and still many decades away... and that is where smart data fills the gap today - cutting down the volume of data needed, mapping directly to the area of application, providing the correlation with logic and reasoning- before it is fed into contextual neural networks and deep learning algorithms.

As algorithms and data get more contextual and smarter, less data makes AI work- especially in the enterprise.

Vikram K.

5 年

True.. next - smarter data..? then data which sanitizes itself..? :)

Kunal Mahajan

Always exploring what I want to be when I grow up! Meanwhile, I'm a jack-of-all in the tech world - product, customer success, business development

5 年

Very thought-provoking and relevant, Arti - makes a lot of sense that enterprises have context that allows them to do more with less data!

1 次回应

查看更多评论

要查看或添加评论，请登录

Arti Khanna的更多文章

The? X-factor?behind a successful Enterprise AI deployment

2023年12月26日

The? X-factor?behind a successful Enterprise AI deployment

Despite incredible advances in AI, only a handful of companies, so far, have captured appropriate value. Why is it so…

1 条评论
How we used nudges during the pandemic… to keep our eyes on security!

2021年3月20日

How we used nudges during the pandemic… to keep our eyes on security!

Nudge is a concept in behavioural science (popularised by Richard H. Thaler in his book Nudge) which proposes positive…

2 条评论
Reactions and not Feedback is the right way to test ideas with clients…

2020年9月20日

Reactions and not Feedback is the right way to test ideas with clients…

..

2 条评论
Proving our own ideas wrong – why are we so reluctant?

2020年7月18日

Proving our own ideas wrong – why are we so reluctant?

Covid-19 has forced us to re-evaluate basic assumptions - and of-course that’s not easy. That is why it’s inspiring to…

2 条评论
From Learning to Attention… why my fascination with Neuroscience continues to grow…

2020年5月6日

From Learning to Attention… why my fascination with Neuroscience continues to grow…

I have often talked of my fascination with neuroscience and how it has inspired the underlying framework for our smart…

7 条评论
A false sense of Security.. do not take the risk!

2020年1月22日

A false sense of Security.. do not take the risk!

It was a sunny cold winter afternoon… and in a span of a few hours, at the open-air cafe, I saw unattended laptops (not…
The vyn journey – a glimpse into what is working for us

2019年11月12日

The vyn journey – a glimpse into what is working for us

Nuggets learnt the hard way Every milestone is an opportunity to stop, reflect and learn from success..

2 条评论
IPR defines a startup's very existence

2019年11月7日

IPR defines a startup's very existence

A reminder for all startups: IPR initially defines your very existence and then becomes the foundation for your market…

4 条评论
An old lesson revisited – share the vision, but sell the reality…

2019年3月21日

An old lesson revisited – share the vision, but sell the reality…

A tragic story that is almost unbelievable but full of lessons (enough to win a page in my diary - UnLearning) I am so…

1 条评论
Why AI is more than cheap Prediction…it is equally about self-organising dynamic workflows

2018年10月23日

Why AI is more than cheap Prediction…it is equally about self-organising dynamic workflows

I recently read the article The Simple Economics of Machine Intelligenceand the book Prediction Machines. It got me to…

2 条评论

See all articles

Arti Khanna的更多文章

The? X-factor?behind a successful Enterprise AI deployment

How we used nudges during the pandemic… to keep our eyes on security!

Reactions and not Feedback is the right way to test ideas with clients…

Proving our own ideas wrong – why are we so reluctant?

From Learning to Attention… why my fascination with Neuroscience continues to grow…

A false sense of Security.. do not take the risk!

The vyn journey – a glimpse into what is working for us

IPR defines a startup's very existence

An old lesson revisited – share the vision, but sell the reality…

Why AI is more than cheap Prediction…it is equally about self-organising dynamic workflows

社区洞察