Anomaly Detection with Computer Vision

Anomaly Detection with Computer Vision

Introduction

An Anomaly is an event or item that deviates from what is expected. The frequency of an anomaly is low in comparison to the frequency of standard events. The anomalies that can occur in the products are usually random, some examples are changes in color or texture, scratches, misalignment, missing pieces, or errors in the proportions.

Anomaly Detection allows us to fix or eliminate those parts or elements that are in bad condition from the production chain. As a result manufacturing costs are reduced because of the avoidance of producing and marketing defective products. Anomaly detection, in factories, is a useful tool for Quality Control Systems because of its features and is a big challenge for Machine Learning Engineers.

Using Supervised Learning is not a recommended practice because of: the need for intrinsic features in anomaly detection and the use of the low quantity of anomalies in a full dataset (training/validation). On the other hand, image comparison could be a feasible solution but Standard Images handle several variables such as light, object position, distance to object, and others; which doesn’t allow the pixel-to-pixel comparison with a standard image. Pixel-to-pixel comparison is integral in the detection of anomalies.

Besides the last conditions, our proposal includes the use of Synthetic Data as the way to increase the Training Data Set; we choose two different kinds of Synthetic Data, the random Synthetic Data and similar to Anomaly Synthetic Data. (see Data section for more details)

The goal of this project is to classify Anomaly — Not Anomaly using Unsupervised Learning and Synthetic Data as data augmentation methodology.

This project is a proposal from the startup LinkedAI, a Colombian Enterprise expert in data labeling for Artificial Intelligence projects.

Background Research

Anomaly detection is associated with finance and detecting “bank fraud, medical problems, structural defects, malfunctioning equipment” (Flovik et al, 2018). The focus of this project was on anomaly detection using image datasets. The application of which would be on a production line. At the beginning of the project, we familiarize ourselves with the functionality and architecture of Autoencoders with regard to their use in anomaly detection. As part of the data plan, we researched the importance of including synthetic noisy images and real noisy images (Dwibedi et al, 2017).

Having a Data Plan was an important part of this project. Choosing a data set that had enough original images and enough real noisy images. Using both synthetic and real images. When working with real images the images (data) that are needed for full coverage of an object and its environment may not be available with regard to views and scale; “…distinguishing between such instances requires the dataset to have good coverage of viewpoints and scales of the object”(Dwibedi et al, 2017). The use of synthetic data allows for “good coverage of both instances and viewpoints”(Dwibedi et al, 2017). Creating the synthetic image datasets, which included synthetically rendered scenes and objects, was accomplished by using the?Flip Library , an open-source python library created by LinkedAI. The Dwibedi et al, 2017 “Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection” through their training and Evaluation showed that training using synthetic datasets was comparable in results to training on the real image datasets.

Autoencoder architecture “typically” learns the representation of a dataset for dimensionality reduction (encoding) of the original, thus creating the bottleneck. From the reduced encoding of the original, a representation is generated. The representation (reconstruction) generated is as close to the original as possible. Both the input layer and the output layer of the autoencoder have the same number of nodes. Variational Autoencoders add a layer to the autoencoder just before the bottleneck “the bottleneck value is created by picking it from a random normal distribution” (Patuzzo, 2020). There is some reconstruction loss in the reconstructed output image which (Flovik, 2018) can be used via distribution to define the threshold value of the original image input. The threshold is the values by which an anomaly can be determined.

Denoising autoencoders allow the hidden layer to learn “more robust filters” and reduce overfitting. An autoencoder is trained to reconstruct the input “from a corrupted version of it” (Denoising Autoencoders (dA)). Training would include the original image as well as the noisy or “corrupted image”. With the introduction of the stochastic corruption process, the denoising autoencoder is expected to encode the input and then create a reconstruction of the original input by removing the noise (corruption) from the image. In Vincent et al (2020), “Extracting and Composing Robust Features with Denoising Autoencoders” the Denoising Autoencoder should be able to find structures and regularities as characteristics of the input. Regarding images, the structures and regularities would have to be capturers “from a combination of many input dimensions” (Vincent et al., 2020). Vincent et al (2020) hypothesis reference the “ robustness to partial destruction of the input” should be a criterion for “good intermediate representation”.

The emphasis, in this case, would be on the ability to obtain and create a large number of images both original and with noise. We used both real and synthetic data to create a significant number of images with which to train our model.

According to Huszar (2016), Dilated Convolutional Autoencoder “support exponential expansion of the receptive field without loss of resolution or coverage.” Maintaining resolution and coverage of an image is integral to the reconstruction of that image from the Dilated Convolutional Autoencoder and anomaly detection using images. This moves the autoencoder in the decoder stage, from creating a reconstruction of the original image to a much closer approximation that may result from the “typical” autoencoder structure. Dilated Convolutional Autoencoders in Yu et al. (2017), “Network Intrusion Detection through Stacking Dilated Convolutional Autoencoders”; the goal of the model was to combine unsupervised learning features and CNNs to learn features from large amounts of unlabelled raw traffic data. The interest is in identifying and detecting complex attacks. By allowing “very large receptive fields while only growing the number of parameters logarithmically” Huszar (2016); incorporating the feature learning of the unsupervised CNN; stacking these layers (Yu et al., 2017) they were able to achieve a “remarkable performance” from their model.

Technology

Flip Library (LinkedAI)

Flip is a python library that allows you to generate synthetic images in a few steps from a small set of images made up of backgrounds and objects (images that would be in the background). It also allows you to save the results in jpg, json, csv, and pascal voc files.

Python Libraries

Several Python Libraries have been used in this project with different purposes:

Visualization (images, metrics):

  • OpenCV
  • Seaborn
  • Matplotlib

Arrays handling:

  • Numpy

Models:

  • TensorFlow
  • Keras
  • Random

Image Similarity Comparison:

  • Imagehash
  • PIL
  • Seaborn (Histogram)


Weights & Biases

Weights and Biases is a developer tool that tracks the machine learning model and creates visualizations of the model and the training. It functions as a Python Library and can be imported as?import wandb. It works within Tensorflow, Keras, Pytorch, Scikit, Hugging Face, and XGBoost. Use?wandb.config?to configure the inputs and hyperparameters; to track the metrics and create visualizations for the input, hyperparameters, model, and training; making it easier to see where changes can and need to be made to improve the model.


Method & Structure

We started our project based on current architectures for Autoencoders which specialize in using images with convolutional networks (see below graphs). After some preliminary tests, based on the research (see References) and advice from mentors, we changed to final architecture.

No hay texto alternativo para esta imagen

Use of Dilation Feature

The dilation feature is a special Convolutional Network where holes are inserted in the traditional convolutional kernels. In our project, we applied the dilation feature specifically to channel dimensions, without impacting the image resolution.

No hay texto alternativo para esta imagen

Image Similarity

One of the critical points for this project was to find an Image Comparison metric. The Image Comparison metric was used to train the model, build the histogram, and to calculate the threshold on which to classify images as Anomaly or Not Anomaly.

We started with L2 Euclidean distance pixel by pixel. The results did not identify some of the differences. We used the Python Imagehash library with its different hashes (perceptual, average, and difference) and we received different results for similar images. We found that the SSIM (Structural Similarity Index Measure) metric gave us a measure of the similarity between a pair of images and additionally it is a built-in loss from the Keras library.

Histograms

After training and evaluating the model, with its respective datasets, it was necessary to identify the similarity between the reconstructed and the original images. Of course, due to the diversity of original images (eg. size, position, color, bright and other variables), there was a range for this similarity. We used the Histogram as a graphical representation to visualize the range and also to observe at which point we would have Non-Similar images.

No hay texto alternativo para esta imagen

Data

The data used was downloaded from Kaggle:?Surface Crack Detection Dataset ?(crack dataset) and?casting product image data for quality inspection ?(casting dataset).


The first, Crack dataset, has 20,000 negative wall images (no cracks) and 20,000 positive images (with cracks). In this case, the cracks were considered anomalies. All data is 227x227 pixels with RGB channels. Examples of each of the groups are shown below.

No hay texto alternativo para esta imagen

We used 10,000 images from the group without anomalies to generate different synthetic datasets. The synthetic datasets were then divided into two types: one with noise similar to anomalies (51 images were created with Photoshop); another with noise using random objects such as fruits, plants, and animals (80 free images downloaded from?pixabay ?page). All the images used as noise are in png format and with a transparent background. Below are some examples of the two types of datasets used for model training.

No hay texto alternativo para esta imagen

The second; Casting dataset is composed of two groups, one with images of 512x512 pixels (781 images with anomalies and 519 without anomalies) and another with images of 300x300 pixels (3137 without anomalies and 4211 with anomalies). All images had RGB channels. The 300 x 300 pixel images were used. The latter, from Kaggle, were divided as training with 91.65% of the data and the remainder for testing. For this dataset, the anomalies were: edge debris, scratches, surface warping, and perforations. Below are some examples of images with and without anomalies.

No hay texto alternativo para esta imagen

We used 1,000 images belonging to the training group without defects to generate the synthetic data datasets. As in the previous case, we created two types of datasets: one with noise similar to anomalies (51 images were created with Photoshop) and the other with noise from random objects such as animals, flowers, and plants (the same 80 images used in the crack dataset). Below are some examples of the images used during model training.

No hay texto alternativo para esta imagen

All synthetic data was created using the Flip library. In each generated image, 2 objects were chosen and placed at random. Three types of transformations were applied to the objects: flip, rotate, and resize. The resulting images were saved in jpg format. The following table shows the datasets used in the project:

No hay texto alternativo para esta imagen

Experiments

Based on the above-explained tables; with the main objective to research which variation of the dataset might present the best results, we trained the model with these data and obtained results (see below graphs)

For each dataset, we evaluated several metrics such as loss (SSIM), recall, precision, F1, and accuracy. For each experiment, the histogram representing the image similarity between the set of noisy images & reconstructed images was evaluated.

To track and compare our results we used the library Weight & Biases which allows an easy way to store and compare the results from each experiment.

No hay texto alternativo para esta imagen

Training

In order to keep the minor quantity of variables in our environment, we decided to always use a dataset of one thousand samples regardless of the relationship between real data and synthetic data.

In the algorithm, we split the respective dataset in 95% to train and 5% to test the results. Aside from this, our evaluation was implemented only with real data.

No hay texto alternativo para esta imagen

Evaluation & Results

The following images show the main results obtained in some experiments. You can find all the results in the following links:

Cracks dataset

https://wandb.ai/heimer-rojas/anomaly-detector-cracks?workspace=user-

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

For the cracks dataset, the experiments had similarly excellent results (range 91% to 98%), without significant differences between the experiments. Its behavior is mainly due to variables like crack size and color in comparison with images without anomalies.

Casting dataset

https://wandb.ai/heimer-rojas/anomaly-detector-cast?workspace=user-heimer-rojas

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

Challenges

  • Long training times, for which GPUs were used in Google Colab with the pro version.
  • Long data loading time was solved by uploading the compressed data in zip format, in this way a single file per dataset was uploaded and the time was significantly reduced.
  • The original proposal was to use a dataset from a Colombian automobile production line; unfortunately, the quality and quantity of Positive and Negative images were not enough to create an appropriate Machine Learning model. This situation led us to make the decision to use datasets from Kaggle with similar conditions to line production.
  • Visualization of differences in case of Anomaly is different for each dataset and the normal image structure should be taken into account — like color, brightness, and other intrinsic characteristics of the images
  • Ethical: Human expertise is needed to choose the proper threshold to follow based on the threshold of real data or synthetic data. It may depend on the case.

Discussion

Several steps are required for the implementation of a real Machine Learning project from the idea to the implementation of models. This includes dataset selection, collection, and processing.

It is important to have “debugging scripts” in projects working with images. In our case, we used a script that allowed us to visualize: the original dataset, new synthesized images, and the cleared images after the autoencoder, enabling us to evaluate the model performance.

Mentors

Cristian Garcia, Technical Advisor

Cristian is a Machine Learning Engineer and contributor to various open-source projects including Pypeln (author) and Tensorflow Addons. He is a community leader and active speaker at conferences worldwide, founder of the Machine Learning Meetup Medellin, and cofounder of Machine Learning Colombia. Cristian is a member of Toptal: top 3% of the developer talent in the world.

Paula Villamarin Puertas, CEO LinkedAI

Paula is the CEO and co-founder of LinkedAI a training data generation platform for Machine Learning. Before founding LinkedAI Paula led the Robotic Maintenance team at KiwiBot and was responsible for the hardware design and assembly. Paula earned a bachelor of product design, she’s a community leader and an active speaker at conferences worldwide.

Diego Parra, CTO LinkedAI

Diego is CTO and co-founder of LinkedAI a training data generation platform for Machine Learning. Diego has more than 10 years of experience in software development and has led tech teams at different companies. Diego is an active community leader and Open source contributor. Diego is an ambassador of CityA a global AI practitioner network enabling the diverse and responsible development & application of AI.

Authors

Heimer Rojas Castellanos

Electronic Engineer, Entrepreneur, Full Stack Software Engineer

Holberton School Bogota, Colombia

Specialization: Machine Learning

https://www.dhirubhai.net/in/heimerrojas/

Mia L Morton

Educator, Software Engineer

Holberton School Connecticut, United States of America

Specialization: Machine Learning

https://www.dhirubhai.net/in/mialmorton/

Abdel Giovanny Perez

Electronic Engineer, MBA, Full Stack Developer

Holberton School Bogota, Colombia

Specialization: Machine Learning

https://www.dhirubhai.net/in/abdel-perez-url/

Ximena Carolina Andrade Vargas

Mechatronic Engineer, Software Engineer

Holberton School Bogota Colombia

Specialization: Machine Learning

https://www.dhirubhai.net/in/xicav369/


References & Work Cited

Identifying Similar Images with TensorFlow, douglasduhaime.com/posts/identifying-similar-images-with-tensorflow.html.

Byeon, Eunjoo. “Exploratory Data Analysis Ideas for Image Classification.”?Medium, Towards Data Science, 11 Sept. 2020, towardsdatascience.com/exploratory-data-analysis-ideas-for-image-classification-d3fc6bbfb2d2.

“Denoising Autoencoders (DA)?.”?Denoising Autoencoders (DA) — DeepLearning 0.1 Documentation, deeplearning.net/tutorial/dA.html.

Dwibedi, Debidatta, et al. “Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection.”?2017 IEEE International Conference on Computer Vision (ICCV), 2017, doi:10.1109/iccv.2017.146.

Dwibedi, Debidatta, et al. “Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection.”?2017 IEEE International Conference on Computer Vision (ICCV), 2017, doi:10.1109/iccv.2017.146.

Flovik, Vegard. “How to Use Machine Learning for Anomaly Detection and Condition Monitoring.”?Medium, Towards Data Science, 29 Sept. 2020, towardsdatascience.com/how-to-use-machine-learning-for-anomaly-detection-and-condition-monitoring-6742f82900d7.

Gong, Dong, et al. “Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection.”?2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, doi:10.1109/iccv.2019.00179.

Huszar, Ferenc. “Dilated Convolutions and Kronecker Factored Convolutions.”?InFERENCe, InFERENCe, 12 May 2016,?www.inference.vc/dilated-convolutions-and-kronecker-factorisation/.

Lipton, Zachary C. “Heuristics for Scientific Writing (a Machine Learning Perspective).”?Approximately Correct, 11 July 2018, approximatelycorrect.com/2018/01/29/heuristics-technical-scientific-writing-machine-learning-perspective/.

Michel Kana, Ph.D. “Variational Autoencoders (VAEs) for Dummies — Step By Step Tutorial.”?Medium, Towards Data Science, 23 May 2020, towardsdatascience.com/variational-autoencoders-vaes-for-dummies-step-by-step-tutorial-69e6d1c9d8e9.

Monn, Dominic. “Denoising Autoencoders Explained.”?Medium, Towards Data Science, 18 July 2017, towardsdatascience.com/denoising-autoencoders-explained-dbb82467fc2.

Park, Hyunjong, et al. “Learning Memory-Guided Normality for Anomaly Detection.”?2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, doi:10.1109/cvpr42600.2020.01438.

Shung, Koo Ping. “Accuracy, Precision, Recall or F1?”?Medium, Towards Data Science, 10 Apr. 2020, towardsdatascience.com/accuracy-precision-recall-or-f1–331fb37c5cb9.

“Tutorial: Your First Model (DAE) · OpenDeep.”?OpenDeep,?www.opendeep.org/v0.0.5/docs/tutorial-your-first-model.

Vincent, Pascal, et al. “Extracting and Composing Robust Features with Denoising Autoencoders.”?Proceedings of the 25th International Conference on Machine Learning — ICML?’08, 2008, doi:10.1145/1390156.1390294.

Vincent, Pascal, et al. “Extracting and Composing Robust Features with Denoising Autoencoders.”?Proceedings of the 25th International Conference on Machine Learning — ICML?’08, 2008, doi:10.1145/1390156.1390294.

Yu, Yang, et al. “Network Intrusion Detection through Stacking Dilated Convolutional Autoencoders.”?Security and Communication Networks, Hindawi, 16 Nov. 2017,?www.hindawi.com/journals/scn/2017/4184196/.

要查看或添加评论,请登录

LinkedAI的更多文章

社区洞察

其他会员也浏览了