From Pixels to Profits: How Synthetic Image Generation Changes Everything
You probably heard a lot about the impressive capabilities of Deep Learning models in various fields of application: image, speech, and text were first recognized by AI, then became manipulable and now the attention is on generative models that become better and faster at creating content.
Their popularity exploded after ChatGPT was released in November 2022, but a variety of solutions exist on the market or can be created from scratch for custom and/or highly confidential purposes. Regardless of its fame, one thing stands at the foundation of any machine learning model’s training and consequent success: collecting quality data.
However, collecting enough data can often be a challenge, especially when you consider the limitations of time, resources, and access to reliable datasets. But what if there was a way to overcome these hurdles and get more quality data efficiently?
Synthetic data generation paired with Domain Adaptation could be the answer: AI models can generate high-quality synthetic data to train other models.
In this article, we will dive deeper into generating images for Computer Vision models and how this can help overcome the limitations of classic data collection, such as data unavailability, privacy restrictions, or difficulty in acquisition due to high-precision requirements.
Typical application scenarios are those involving public environments where collecting data raises concerns about data privacy (e.g. objects or people detections on roads), or those with very specific tasks requiring the acquisition of data with expensive specialized hardware and/or procedures (e.g. gaze detection and eye tracking).
Synthetic images are created with extreme precision, simulating realistic scenarios and adapting them to the specific domain in which they will be used while providing a large saving in costs and time.
At the foundation of Foundation Models: Data.
Data is at the base of any AI model: if good-quality data is provided, the right model can solve any reasonable task.
Deep Learning models are very data-hungry, but simply having a lot of data is not enough to produce good results. “Good quality” is not just an empty adjective: it is the key point to ensure that the model is able to learn the correct patterns and correctly generalize the information, in order to obtain an optimal solution for the tasks it will be presented with.
In summary, quantity is undoubtedly an important factor, but data must also be complete, without noise or errors, and capture the diversity of real-life situations and tasks the model will face.
If the data used to train the model is of poor quality, incomplete, or unrepresentative, the model may learn incorrect or inadequate information, producing inaccurate and unreliable results. For this reason, collecting and curating quality data is a critical step in developing an AI model.
Time and effort must be invested in ensuring that the data is accurate, complete, and representative of the context in which the model will operate. Only with quality data is it possible to obtain reliable results and fully exploit the potential of AI.
No easy feat
In many industrial applications, building datasets compliant with the 3 pillars described above is costly or simply unfeasible.
In this era of digitalization and innovation, good data is very valuable, and companies are reluctant to share theirs.
So, what are the alternatives?
This data access problem can become insurmountable for small and medium companies, that have limited resources compared to Big Tech. To read more on this topic, this article published in the Wall Street Journal explains how getting access to data is not just a matter of funds.
The newest alternative: Synthetic Data Generation & Domain Adaptation
Consider a situation in which we have acquired some data in a production environment. They could be images of people and we might want to detect their actions. Or pictures of products on the shelves of a supermarket, to implement a smart checkout and inventory system. The datasets are available, but they are either few or typically unlabeled.
You could search online for a pose-detection dataset, but the context and point-of-view of its images are likely different from your production environment. For the market’s products, it’s even harder to find available matching pictures of the products of interest, taken from different angles.
Here is where synthetic data can help solve the task.
Synthetic data
For Computer Vision purposes, they are generally images generated by simulation software. First, 3D models of the objects of interest and the environments in which they are placed are created. They could either be created by a design team, available in existing applications, or generated entirely by an AI model. From those, it’s then possible to capture many images, with full control over the parameters (point-of-view, background, context, pose, etc.).
Moreover, this computer-aided design provides all the labels associated with the images (ie. the precise position of the product in the picture). With synthetic data generators, we can get a lot of clean, diverse images, hence a good-quality dataset.
This variety can also be valuable for testing and validating AI models in controlled environments. By generating specific scenarios, anomalies, or edge cases, it becomes possible to evaluate the model’s performance and uncover potential weaknesses.
领英推荐
Lastly, synthetic data have another big advantage over real ones: privacy protection. Acquiring ad-hoc real images can be very difficult not only for the cost but also for privacy constraints. Synthetic data can be an effective means of preserving privacy.
By generating synthetic data that retains the statistical properties of the original data but does not disclose any sensitive information, it is possible to share or publish the synthetic data without compromising privacy.
Domain adaptation
Domains are data grouped by a set of characteristics and environments. Domain Adaptation is the technique we use to bridge the gap between two different domains, aka datasets acquired in different environments and with different characteristics.
Domain adaptation allows us to train a model with a source dataset made of synthetic images generated with AI. Thanks to DA, the model will be capable of performing well when later used with our target (real) unlabeled data.
The various techniques we apply to reconcile source and target datasets are:
Each of them has its advantages and disadvantages, and they can be more or less effective in certain given situations. The choice of method depends on the characteristics of the domains and the available data, as well as the specific requirements and constraints of the task at hand.
If you are interested in diving deeper into the technical aspects of these methods, here is a good survey on DA techniques: https://arxiv.org/abs/2009.00155
Practical Applications and Use Cases
Unity 3D (a popular rendering engine and editor to create interactive content) provides a way to implement your own data generation software thanks to its Perception package . Many pre-made simulators built on Unity can be found online, with large libraries of ready-to-use 3D assets (people or objects). Whether you use those assets or import 3D models of your own objects of interest, you can then easily set the scene and, very quickly, you’re ready to start acquiring data. What’s more, this software can be freely used for commercial purposes.
Here’s a real-world example of how we can use this technology: imagine a smartphone application for training at home, and one of its features is to guide the user to correctly do the exercises, in real-time. The application will recognize how good is the posture while the user is doing the exercise and will be able to tell which body part has to be adjusted and how. To do this, not only do we have to teach a model which are correct postures for each exercise, but also have to estimate the pose of a person by recognizing its body key points and segments.
Our problem here is we need lots of images of people in those very specific poses (the exercises we want to include in the app), with a variety of shooting positions, often from unusual angles (ie. the smartphone could be placed on the floor, or on a low piece of furniture). For our model to reach optimal performance, we need a dataset that represents our situation with enough quantity, diversity, and labeled information.
Some datasets for pose estimation with commercial licenses are publicly available and suitable for pretraining our network on the task. However, they contain people doing generic actions, in generic positions and at different distances from the camera, and will require some fine-tuning data.
We could build a dataset ad-hoc by acquiring images, captured with a smartphone and of people doing the exercises we need, but there are some problems:
Creating a good-quality dataset in this way can be really difficult. Realistically, we would be able to acquire a few hundred unlabeled images that cover different situations.
We must create the fine-tuning dataset in a different way.
Unity researchers provided the PeopleSansPeople data generator, a tool for generating data involving humans for tasks such as detection and pose estimation. Images are generated with random human models, backgrounds, poses, viewpoints, and lighting conditions, to form a good-quality dataset. The developers released the code in a public repository with a commercial license, so we can use it as a starting point to control the poses and viewpoints of the generation process. This way, we can acquire a more suitable labeled dataset for our task.
Now we have a good synthetic dataset and a set of unlabeled images acquired from the real environment of our application. However, simply fine-tuning the network on the synthetic data will lead to poor performance due to the domain gap. Domain Adaptation will solve this issue — for example, we can use the Adversarial Generative method we mentioned previously, where a Generative Network trained with our data will convert a synthetic image from PeopleSansPeople into an output image more similar to our real-world production case.
The synthetic images we generated, that have then been translated into the real-world domain by our neural network, can now be used to fine-tune our pose estimation model with great accuracy.
In a world filled with real-world challenges in data acquisition, we’ve explored a groundbreaking solution: synthetic data generation.
By harnessing the power of artificial intelligence, we can now overcome the limitations of traditional data collection methods. Thanks to the rapid advancement of technologies and frameworks, the integration of synthetic data generation and adaptation into machine learning pipelines has become remarkably seamless and user-friendly.
In summary, although it may initially seem like an additional burden, leveraging generated data holds the key to slashing costs and speeding up development. Embracing the realm of synthetic data empowers us to propel our clients’ projects to new heights.
Are you ready to break free from the constraints of traditional data acquisition methods?
This article was written by Jason Ravagli , Machine Learning Engineer at Artificialy SA.