登录查看更多内容

Automated Data Augmentation: Solving the data lack in Machine Learning

Abdel Giovanny Perez

Data Science Developer at Business Support

发布日期: 2020年10月11日

The last few years have seen an explosion of applications based on Machine Learning algorithms and in most of the successful Supervised learning algorithms, the need to have a large training dataset (millions of data) to obtain accurately and trusted models has been detected in the real world. See Figure 1.

Depending on the type of application, the task of obtaining large amounts of data requires a large number of resources, therefore it is necessary to use Data Augmentation techniques to generate different data based on an original data set.

Data Augmentation Automation.

Next, I will describe the automation process for the Data Augmentation of images. Set the main parameters for Data Augmentation automation

Number of new data required

Based on the existing images, how many new images should be generated.

2. Original data set

Establish the complexity of the Data Augmentation process

3. Image processing.

Basic techniques of image processing as described in the below paragraphs
Generation of new images (GAN). Advanced techniques of data augmentation based on Generative Adversorial Neural Networks

In the case of using image processing, it must be established which Data Augmentation techniques will be applied. For example.

1.Flip.

Horizontal or Vertical flip of the original image.

2. Crop

Sampling a section of the original image, scaling the new image to the original size.

3.Rotation

Rotation of original image and scaling the new image to the original size.

4. Zoom (in / out)

Scaling the original image

5.Brightness / Contrast / Gamma / Hue / Saturation

Changing parameters related to color.

6.Color / Gray

Change from the original image with Color to Grey

7.Scale

Scale the original image inward or outward

8.Gaussian Noise

Add Gaussian Noise to Original Image.

After choosing the different variables with which our automation system will generate the new images, a function will be defined with the following characteristics:

Inputs:

List of tuples with the name of the technique and number of images to generate. Intrinsic parameters like percentage zoom, offsets, or rotation angles shall be picked up randomly in order to ensure the variability of generated images.

Returns:

Directory with new images

Example:

The main file shall include the following:

path_dir_augmentation = “./dir_augmentation/”

path_dir_source = “./dir_source/”

list_tec = [(flip, 50), (crop, 30), (scale, 50), (gauss, 50)]

auto_data_augmentation(path_dir_source, path_dir_augmentation, list_tec)

And the function auto_data_augmentation shall look like:

Function auto_data_augmentation(dir_source, dir_augmentation, list_tec):

list_source = read(dir_source, *.jpg)

for i in list_source:

  for j in list_tec:

    if j[0] is “flip”:

       par_flip = random(1, 4, seed(1), j[1])

       flip(i, dir_augmentation, j[1], par_flip)  

    elif j[0] is “crop”:

       crop_random(i, dir_augmentation, j[1])

    elif j[0] is "bright":
       .....

References:

https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2/

Antoniou, A., Storkey, A. & Edwards, H. Data augmentation generative adversarial networks, arXiv:1711.04340 (2017).

Automated Data Augmentation: Solving the data lack in Machine Learning

Abdel Giovanny Perez

Data Science Developer at Business Support

更多精彩文章

社区洞察

其他会员也浏览了

Mastering Machine Learning: A Guide to Hyperparameter Tuning

Exploring the Most Complex Topics in Data Science and Their Impact on Supply Chain Management

LSTM for Enterprise Time Series Forecasting

The Rise of Automated Machine Learning

How to handle limited ground truth?

?? Image Classification: Supercharging Image Classification with Transfer Learning and Ensemble Models ??

AI-Driven RAS Enhancements for Data Centers: A Technical Perspective

Navigating Digital Spaces: A Guide to Preventing Unwanted Images from Being Uploaded

Classification vs Regression

Key Responsibilities of ML Engineers in Machine Learning Implementation

Predicting the Bitcoin Price using Neural Networks

2020年7月11日

Using Gaussian Process in Bayesian Optimization

2020年6月15日

Face Recognition & Verification. Pros & Cons.

2020年4月27日

Transfer Learning ?How to reach 88% on accuracy?

2020年4月13日

Summary - ImageNet Classification with Deep Convolutional Neural Networks

2020年3月26日

Optimization techniques in Machine Learning

2020年3月4日

Activation functions in Neural Networks

2020年2月23日

Is a new star growing in the Universe?

2019年11月8日

I wrote a Web Address, now what?

2019年8月26日

IoT: Is the microwave chatting with the freezer?

2019年7月26日