Don't get Fooled! How to apply AI for detecting anomalies

Don't get Fooled! How to apply AI for detecting anomalies

The world rotates each day once around its own axis, and each day - whether normal or not – different things happen. 

Abnormal things are not always good and in your company those events can quickly lead to unpleasant effects – for your customers and ultimately for you.

Therefore, the people in charge want their processes or products to work smoothly without any issues.

Monitoring these issues is usually done manually, which can be extremely costly.

What if there was a tool that was able to automatically detect recurring errors, a tool that could intelligently group those errors and as a result, would give you deeper insights into the understanding of the problem?

With such a tool you could significantly improve the operations in your company.

If you would like to understand how such a technology could be applied within your company then this article is for you.

If not, you can also consider watching this funny cat video: Super funny cat video - Click NOW!

Note: Those who will read to the end will be rewarded with a second even funnier cat video.

AI based anomaly detection – What is that?

When thinking about the hype of machine learning and artificial intelligence, most people first think about autonomous cars, image or speech recognition.

In fact, this just represents a small fraction of the gigantic application spectrum.

AI methods offer completely new opportunities to tackle problems, often offering better solutions or better support in solving them.

One of the most interesting applications is the so-called anomaly detection.

As the name suggests it is about detecting events that are somehow not normal.

In this context an anomaly is denoted by the following two criteria:

  1. It is a rare event
  2. Those rare events mostly result in undesired consequences

Practical examples for those two criteria are abundant:

For example, think about an extremely reliable machine that is responsible for the operation of a very important function.

Since no system is 100% reliable, this machine will rarely encounter a malfunction.

If you would like to apply a Predictive Maintenance method to anticipate possible failures this will pose a big challenge.

Because a classical machine learning approach will need positive as well as negative examples, i.e. (sensory) data for the correct machine operations and data for failure.

With these data a prediction model can be inferred.

Given a new measurement in the future, the algorithm can then predict: From historical experience this measurement resembles no malfunction with a probability of 85.3%.

However, a classical AI approach will need a sufficient amount of example data of both classes, positive and negative.

Given a very reliable machine, the collection of malfunctional data can become pretty difficult.

Here, failure denotes an anomaly because it firstly occurs rarely and secondly leads to undesirable effects. 

So, for the application of a classical AI method this setting makes up a very special and partially unsolvable challenge.

This kind of problem is not exclusive to Predictive Maintenance though.

In fact, you can find many application scenarios.

For example, the application of a web service or a web API.

What does the operational data look like? Does it deviate from normal behavior? Could this be due to a failure in the system?

Or consider you are responsible for a production process: Are the steps for each process taken in the correct order? Are the steps maybe not reliable enough?

In those example scenarios the people in charge spend a lot of time in minimizing the occurrences of such events.

Thus, a classical anomaly setting is characterized by:

  1. Correct behavior can be abundantly observed
  2. Failure is a rare and an expensive event

Anomaly detection algorithms have been developed to solve these exact problems.

These detectors make explicit use of the uneven distribution of positive and negative examples.

The basic principle behind anomaly detection is simple: Sound the alarm as soon as something does not look normal.

Although you are lacking example data for malfunctional behavior you on the other hand have many examples of correct operations.

Now, the following clever hypothesis is stated: You have repeatedly seen so many correct events that everything else that will differ from these data must be some kind of malfunction.

Or to put it differently: It is assumed that with high confidence one can say what is normal.

So, the principal functionality of an anomaly detection approach looks like this: 

  1. Learn what the structure of normal data looks like
  2. Compare new data with this normal structure
  3. If it deviates too much: Sound the alarm!

The basic principal is intuitive and easy to understand. But there is a new problem: At which point should we consider something not normal? How much must it deviate until we can reliably predict a malfunctional operation?

You have to find some kind of threshold that has to be passed in order to trigger an alert. 

This is the crux of this matter. Because we still need at least some negative examples in order to find a reasonable threshold.

Since the data is not available we have to apply an algorithm that, given these circumstances, estimates a reliable threshold.

Criteria to set the alarm threshold

The primary objective when setting the right alarm threshold is accuracy.

This means that while trying to minimize any kind of false alarms (false positives) you also want to recognize any true anomalies.

In the beginning, a conservative approach is advisable. I.e. you want to (on purpose) frequently observe a false positive or false alarm.

The incidents are then investigated and labeled by human experts. The goal is to mark these incidents as real or false anomalies.

You might ask yourself: How does this make sense?

  1. In the beginning you are usually not sure about what a real malfunction looks like. By taking the conservative approach you can reduce the probability of missing out an anomaly.
  2. As time goes by you will develop a better understanding of the sensitivity of your AI method and can adjust the tolerance step by step.
  3. Due to the subsequent labeling, you will receive high quality of data as it will lie in the border zone of normal and abnormal operations. You will gain a deeper understanding of precisely what you’re most unsure of. 

Therefore, the conservative approach assures reliability, understanding and quality.

In the beginning you should expect a higher workload. However, as time goes by this overhead will reduce as your employees gain experience and understanding.

In the case that the number of false positive does not decrease or worse if you miss out real anomalies you should check your data again with respect to significance and informative value.

If the latter is the case, you should also recheck your data sources and find other new data sources to make the dataset more diverse and thus more significant.

The finding of a good threshold is eventually a trade-off between costs and accuracy. Therefore, the decision has to be made individually.

Further checks by applying manual inspections will probably increase the quality but will also lead to higher costs.

Types of Anomalies

There are essentially three types of anomalies.

The easiest one is the so-called point anomaly. This kind of anomaly can be simply explained by the following picture:

Es wurde kein Alt-Text für dieses Bild angegeben.

Point Anomalies are made up of individual outlier points. These are data points whose properties or features significantly deviate from the bulk of normal data. 

The other two types can be regarded as special cases of the point anomaly.

The first one is the Contextual Anomaly.

As an example, consider the number of visitors in a shopping mall. 

During daytime it will be very busy and a large number of visitors can be considered as normal.

During night though, the only person in the building should be the security guard.

Let’s assume there is a night shopping event. In this case a high number of visitors during the night would be a Contextual Anomaly because this number of visitors would be normal during the day but not at night.

If this event is not taken into account it may trigger a false intruder alert and lead to an unwanted large scale deployment of police men. 

The third and last type is the Collective Anomaly.

In this case a complete group of points constitutes an anomaly.

For example: A process always involves a certain number of steps in a particular order.

So, an individual process sequence can be described by a chronologically ordered sequence (ordered group) of measurements.

If this sequence order is somehow corrupted, this complete group of falsely ordered data points – i.e. a process sequence – can be considered an anomaly. 

As you can see, those two types are a little special since they require additional information or expert knowledge in order to better distinguish these incidents.

But rule based systems already solve this problem! 

That is correct. Rule based systems are already widely and successfully applied in a lot of companies.

Companies use software which regularly checks some rules sets that have been manually created and typed in by some experienced employees.

This approach is intuitive an in principal not bad.

So, why then should you consider an AI system anyways?

Primarily because the manual and rule based approach has several disadvantage that Machine Learning methods do not. 

Firstly, the manual rule maintenance is fairly inflexible and time consuming. 

Rule-based systems have to be constantly managed and adjusted to new circumstances and requirements.

Instead of automatically discovering new rule sets, rules have to be manually (subjectively) recognized and subsequently written down by hand.

As a consequence some unexpected scenarios are not covered because the expert might not be aware of this.

Thus in comparison to the AI approach, the manual process is much slower, less precise and more expensive.

Machine Learning based anomaly detectors are quickly executed, able to assess new findings more objectively and are usually cheaper to adjust.

This directly leads to the next advantage: AI based anomaly detectors make processes and products easier to upscale. 

Machine Learning reduces the overhead you encounter when having to do everything manually.

It requires fewer resources and relieves employees of the tedious development of simple rule sets.

It also enables you to analyze thousands of parameters at the same time.

When only applying a manual approach you will need to compromise on this number. This implies a reduction of the scope of monitoring and a prioritization in the rule development process.

Additionally, the reaction times are longer, because critical incidents might get recognized too late which leads to further delays in the QA process.

To sum up, the application of Machine Learning based anomaly detectors will not only increase efficiency, but also largely extend the scenarios you can cover, significantly improving your product or process. 

Types of Anomaly Detection Algorithms 

First of all understand that THE algorithm does not exist. 

Every situation requires a different approach. 

A complete coverage of all anomaly detection algorithms would be way beyond the scope of this article. 

Therefore, for illustrative purposes I will limit myself to two methods.

Though, the basic principle of any approach is always the same: detect deviations from the normal state.

The first approach is the so-called Dimension Reduction Technique.

The idea behind this method is letting the algorithm find the most significant characteristics of the data that is (optimally) represented in each individual data point.

This is achieved by taking the raw data and reducing it to the most basic characteristic (dimensions). Subsequently, one tries to reconstruct the original data from this reduced data set. 

These algorithms are designed in a way that they purposefully delete some information of the given data i.e. it is that part of the data that carries no relevant information about the underlying structure.

Thus, the algorithm tries to identify the most significant features to reconstruct the original state.

For example: I will give you 1000 pictures of yellow flowers.

Your task now is: Find the five most characteristic features that describe those flowers and that can be applied to as many of the given pictures as possible. 

For example, this could be the color or the shape.

However, since the pictures might have been shot at different locations the background might not be the best characteristic feature.

So, this means it probably make sense to ignore (or delete) this information.

Now, how can this be applied to detect anomalies?

After having identified the most significant features or dimensions that the data shares, you can now simply give the algorithm a new data point and let it extract those features from the given data.

Depending on how similar those extracted features of this new data point are compared to the characteristic features from the training dataset, you might be detecting an anomaly or not. 

Because an abnormal data point might not share those features. 

So, if the extracted features deviate too much you should consider having another look at the incident.

The second example in related to Time Series Analysis. 

Time dependent data is a normal thing in everyday data business. 

A classical example would be the number of unique website visitors or app users.

You count your visiting users on an hourly basis and visualize this data in a time dependent diagram:

Es wurde kein Alt-Text für dieses Bild angegeben.

These measurements are often periodic, i.e. each day the user count will be the largest in the evening, say at around 8 pm and the lowest at 4 am.

Usually you will also observe a general trend, like for example on weekends you will see more users than on Tuesdays or Wednesdays.

Finally, the observed data will also be influenced by random fluctuations which vary around the periodicity and the trend.

Es wurde kein Alt-Text für dieses Bild angegeben.

A decomposition of the given measurements into periodicity, trend and randomness could look like the illustration given above. 

The final row visualizes the fluctuations which are often also referred to as the residuals. The residual is what remains if you subtract the periodicity and the trend from the raw data.

It is assumed to be random because due to limited means, one is not able to fully explain these random occurrences.

The main assumption is that the periodicity and trend will explain the most significant properties of the data.

Once you have derived these factors from historical data, you can use these to identify potential anomalies.

Given some new measurements you simply subtract the periodicity and trend and take a look at the remaining residual.

If (like in the illustration below) the residual deviates too much from the normal residual (no matter whether up or down) you might have found an anomaly.

Es wurde kein Alt-Text für dieses Bild angegeben.

As you can see, although being two completely distinct approaches they both follow the same basic principle. 

In fact, there are countless other methods that depend on the problem at hand. Neither is better or worse suited to solve the issue.

So, by applying AI we finally have found the perfect solution! … Or maybe not?

Machine Learning techniques offer a whole bunch of completely new ways for detecting and analyzing anomalies.

Therefore, we will also have to deal with a completely new set of problems which we would like to avoid.

Like for any other data analysis problem, the data quality is one crucial factor of influence.

The first step an anomaly detector takes is learning the structure of normal data.

This means that your training data should not contain any kind of anomalies because otherwise your detector might learn to accept corrupt data as normal.

Another issue is of course the rate of false alarms.

If they occur too often you should consider more data to increase the significance. As with humans: The more diversity the better.

Because data diversity increases the chances of finding more significant or meaningful data.

And the higher the significance, the higher your confidence in judging your data.

As with other machine learning methods, the anomaly detection is also prone to time drifting data.

For instance, by considering the user count example from above. A change of the content of the website might lead to another underlying trend.

If you do not account for this change you will most likely produce a much higher number of false alarms and real incidents might not get reported.

This can be compensated by regularly refreshing the algorithm with recent historical data.

And last but not least: Anomaly detection is not a generic problem. 

The detection is mostly an individual project since your application is most likely to be unique.

Finally, you should remember the following important fact: Anomaly detectors will always only give you suggestions for anomalies since these methods only consider correlations.

This means these algorithms do not account for causations but only the result of a possible causation.

Thus, you should check each incident manually and then try to explain what has caused it.

Conclusion

Machine Learning based Anomaly Detection offers you a complete new set of possibilities and is relatively easy to implement.

It helps you to summarizes reoccurring incidents, gives you new insights and explanations and, therefore, supports you in improving your product or process. 

This article should have given you enough basic knowledge to help you recognize potential applications for anomaly detection within your company.

Now, since the fun is over we can get serious. As promised in the beginning, here is the link to the second funny cat video: The funniest cat video ever - Click NOW or regret forever!

About the author

Es wurde kein Alt-Text für dieses Bild angegeben.

Dr. Thomas Vanck is an expert for Machine Learning and Data Analysis. Since years, he supports companies in applying their data for bigger success. He is looking forward to hear your questions about your planned or ongoing data projects. Feel free to write him a message. 

要查看或添加评论,请登录

Dr. Thomas Vanck的更多文章

社区洞察

其他会员也浏览了