"AI" based entreprise software & real business needs - few tips for zero waste in your budget
Walid DABOUBI
Head of Data Analytics @ Richemont Group Security | Machine Learning, AI
Ai or not Ai, that's the Question.
It all started with a well crafted email
It's Tuesday morning, your weekly peak motivation moment. As usual, you start your work day by grabbing a coffee and going through your long queue of unread emails. Suddenly and without warning, you are dazzled with another fancy title: "Try out Minta for free". You are caught, you click on it and you figure out that it is about another "AI" based enterprise cybersecurity solution. As it is briefly but well introduced, the tool detects all kind of cyber threats and does everything in your place to keep your company resilient against cyber attacks. Minta looks really promising for your expenentially growing cyber needs. The startup that developed Minta is suggesting a free presentation and a live demo at your office.
This article is a guide to identify if a commercial solution which pretends the usage of "AI" is really based on something intelligent. We will use Minta, an "AI" based cybersecurity solution as a use case. For the most curious ones, Minta means "defender" in ancien greek. It is is virtual software that doesn't exist in real life, it's used in an attempt to make this article a bit more realistic. This case study could be projected on any other topic/domain as "AI" is being used everywhere and for everything nowadays.
AI or not AI, that's the question
You may be wondering why the famous "AI" aka Artificial Intelligence is put between quotes in the title. Well, That's a very legitimate question. Personally, I have always been discreet using this term. It sounds, and it actually is full of fiction. I think that it is not really "real" for what it pretends to be in real life.
Despite the extensive commercial and marketing usage of this shiny term, artificial intelligence such as we intuitively imagine (yes, think about Terminator or Cyborg..) doesn't currently exist. And despite the big worries of the great Stephen Hawking, I think that it has a little chance to exist at least in the near future. "AI" as a concept is really overused and unfortunately most of the products that pretend to be "AI" based are in reality using machine learning, or even worst, traditional rules based algorithms.
Machine learning vs Artificial intelligence
As already mentioned, most of the commercial products that pretend being based on artificial intelligence are in reality based on Machine Learning. And if a product that is pretending the usage of "AI" is not even using ML, then there is a big issue. When talking about "AI" as a concept, we can say that ML could be considered as its only widely available and efficient implementation.
Machine learning could be considered as a bit more "intelligent" than traditional programming, but it's not really intelligent if we want to take as intelligence landmark the most common and known definition of intelligence: the human intelligence.
The most interesting (intelligent) part of machine learning is its considerable ability to automatically figure out complex patterns that are impossible to find/implement using traditional programming. Let's suppose that you are modelling a specific phenomena with the the simple equation Y = a.x + b, based on a set of examples (a big amount of data) of y (the result/output) and x (the variable/feature) machine learning is able to help you finding a and b in order generalise or find y for any given x. The same concept could be applied on more complexe problems (face recognition, text understanding, malware detection etc..) the thing that makes it magic, compared with what we are used to see in traditional programming, but still not intelligent.
Few questions to ask before spending your Bucks
Let's go back to Minta. You clicked on that link in the well crafted email, it led you to a page where you booked a presentation/demo session. A presenter will soon come to your office and present the tool for you and your colleagues. Yay! all your cyber problems will soon be solved.
Before making any engagement and buying Minta, and to avoid getting "fooled", you need to check some points by asking the below questions. According to the answers you will get, how the presenter is confident about what he is saying and your intuition which usually don't give you away, you will finally be able to decide (or not) to spend some k-bucks on the magic solution that is based on "AI".
As said, Minta is a cybersecurity solution. So you need to make an analogy with your business domain when reading the next parts.
1. How does your AI system work
Once I was peacefully presenting a machine learning project I worked on in a conference, when someone asked "Is it real machine learning? or the machine learning that it is based on if/else", he was serious. So, The first question that comes to mind when talking about a system that learns from data, is: how does it work ?
Most of the commercials/engineers who are sent by their companies to present a such "AI" system will probably tell you that they are privileged to have a group of geniuses who are fed with a rare variety of mushroom to keep the high level of their IQ. Those geniuses are working on building the complex maths behind Minta, this highly innovative product. Well, it's a really bad sign if you get a such answer.
It's a bit better if the presenter will humbly tell you that the engineers who are working on Minta are training prediction models using machine learning. If it is the case, you can keep going further with more questions. Ask him about the details of the used machine learning. Something like in the following.
2. What data
Q1: From what data is it learning?
In the context of cybersecurity, it's usually all kinds of logs data from raw network traffic to applicative logs, it could also be historical data like old incidents or malicious emails/URLs. Let's suppose that the gentleman presenting the solution says "Well, it's learning from network traffic". That's a sign that he knows what he is talking about and that the tool is really based on something "intelligent". So, the second question you can ask to positively make him sweating a bit more is:
Q2: What preprocessing steps are your applying to make the raw data consumable by the learning algorithms?
In most of the cases and if it is about a real ML (not the if/else one) based solution, he will tell you that they are encoding the non numeric features. If the solution is dealing with raw text (Eg: emails) as input data he will may be say that they cleaning out the text by removing stop words and punctuation, or may be detecting the language.
If you got something similar to the above answers, keep going on in the same direction by asking the next more technically in depth questions. If it is not the case, that's bad sign and there is a high probability that you are getting fooled, and of corse you know what to do.
3. What type of machine learning
If the presenter is not sure about the details of the used machine learning techniques and he honestly tells you that, this could be considered as a fair answer. If he goes further and tells you something like "yes we are using supervised deep learning..", ask him what data is used as labels and how are they labelling the raw data. As in the previous question, you can go further discussing the used techniques/approach based on the following diagram. Starting from the center "Machine Learning" this diagram guides you through most of the used techniques.
"Building the machine learning model" diagram could also be useful if you want to challenge him about the used ML process and the data flow. You can for example ask about how the dataset is divided into training and validation data. And keep challenging him, only your imagination is the limit.
4. How to plug your solution to our ecosystem
let's suppose that you verified all the points above and you are totally convinced that Mina will bring a real added value to your/team daily activities. The next final technical and very legitimate questions is how this solution could be plugged to your complexe infrastructure? This very specific point is most of the time neglected, especially when we are excited to start using the presented tool, but it's the most important when talking about efficiency and return on investment.
To make sure Minta is easily pluggable to your infrastructure, you need ask some more few questions:
- How to feed your solution with data? Any specific hardware to install? Something like in the image?
- How to interact with existing tools? Do you provide and API? or should we reverse engineer it?
- How to visualise/exploit the prediction results? An oscilloscope, may be?
- How to measure the accuracy of the model predictions and keep enhancing it?
5. Ok, ok, you are now sure it's real AI, do you really need it ?
Now that you are a hundred percent sure that it's about real "AI" and that it could be easily plugged in you ecosystem, the last question to ask is if you really need it ? May be you are already solving the business problem that is supposed to be solved by
using traditional tools or you are able to solve it with a cheaper software. I think that it worth it to double check this specific possibility.
And, It won't be the last well crafted email
The thing that I can be sure about is that you will keep receiving emails about those "amazing" "AI" based entreprise software, the real and the fake ones. The thing that I'm less sure about is if those solution are really based on machine learning and what if they could bringing a real added value to your business. Asking all the above questions will help you to be more sure (or not).