Automated Data Augmentation: Solving the data lack in Machine Learning
The last few years have seen an explosion of applications based on Machine Learning algorithms and in most of the successful Supervised learning algorithms, the need to have a large training dataset (millions of data) to obtain accurately and trusted models has been detected in the real world. See Figure 1.
Depending on the type of application, the task of obtaining large amounts of data requires a large number of resources, therefore it is necessary to use Data Augmentation techniques to generate different data based on an original data set.
Data Augmentation Automation.
Next, I will describe the automation process for the Data Augmentation of images. Set the main parameters for Data Augmentation automation
- Number of new data required
- Based on the existing images, how many new images should be generated.
2. Original data set
- Establish the complexity of the Data Augmentation process
3. Image processing.
- Basic techniques of image processing as described in the below paragraphs
- Generation of new images (GAN). Advanced techniques of data augmentation based on Generative Adversorial Neural Networks
In the case of using image processing, it must be established which Data Augmentation techniques will be applied. For example.
1.Flip.
- Horizontal or Vertical flip of the original image.
2. Crop
- Sampling a section of the original image, scaling the new image to the original size.
3.Rotation
- Rotation of original image and scaling the new image to the original size.
4. Zoom (in / out)
- Scaling the original image
5.Brightness / Contrast / Gamma / Hue / Saturation
- Changing parameters related to color.
6.Color / Gray
- Change from the original image with Color to Grey
7.Scale
- Scale the original image inward or outward
8.Gaussian Noise
- Add Gaussian Noise to Original Image.
After choosing the different variables with which our automation system will generate the new images, a function will be defined with the following characteristics:
Inputs:
List of tuples with the name of the technique and number of images to generate. Intrinsic parameters like percentage zoom, offsets, or rotation angles shall be picked up randomly in order to ensure the variability of generated images.
Returns:
Directory with new images
Example:
The main file shall include the following:
path_dir_augmentation = “./dir_augmentation/” path_dir_source = “./dir_source/” list_tec = [(flip, 50), (crop, 30), (scale, 50), (gauss, 50)] auto_data_augmentation(path_dir_source, path_dir_augmentation, list_tec)
And the function auto_data_augmentation shall look like:
Function auto_data_augmentation(dir_source, dir_augmentation, list_tec): list_source = read(dir_source, *.jpg) for i in list_source: for j in list_tec: if j[0] is “flip”: par_flip = random(1, 4, seed(1), j[1]) flip(i, dir_augmentation, j[1], par_flip) elif j[0] is “crop”: crop_random(i, dir_augmentation, j[1]) elif j[0] is "bright": .....
References:
https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2/
Antoniou, A., Storkey, A. & Edwards, H. Data augmentation generative adversarial networks, arXiv:1711.04340 (2017).