登录查看更多内容

Efficient Data collection with AI-Generated Data from DALL-E

itemis

Innovation for Software and Cybersecurity

发布日期: 2023年4月27日

The power of machine learning is indisputable, but collecting the necessary data to train a model can often be a tedious and time-consuming task. Who has the time to gather tens of thousands of images for training data? But don't worry, we have a solution that can help alleviate the burden of data collection: let AI do the hard work for you! By leveraging AI models to generate synthetic data, developers can save time, resources, and effort, all while achieving high levels of accuracy and diversity in their training data. DALL-E, the little brother of GPT, is one such model that can be used to generate images. In this article, we will explore how DALL-E can be used to create synthetic datasets for training gesture recognition models and discuss the pros and cons of this approach.

Summary

To accelerate the data collection for our gesture recognition model, we utilized DALL-E to create a synthetic dataset of hand gestures displaying a scissors and a rock (fist) gesture. With DALL-E's help, we were able to generate 350 images for each class, complete with diverse backgrounds, lighting conditions, and skin tones. The resulting dataset was both realistic and varied, providing our model with an excellent learning resource.

To further enhance the dataset, We used a technique called data augmentation using the Keras ImageDataGenerator, which allowed us to generate additional images with variations in lighting, orientation, and other factors. This helped to make the dataset more robust and better able to generalize to new, unseen data.

We then fine-tune a pre-trained MobileNet model for our gesture recognition task using transfer learning. With this approach, we achieved an accuracy of 96.5% on the validation data, demonstrating the effectiveness of our method. This is all done with minimal effort and in less than half an hour. The combination of AI-generated data, data augmentation, and transfer learning allows for creating highly accurate and effective models, with potential applications in a wide range of fields beyond gesture recognition. But let us start at the beginning and explain everything step by step.

Data Collection

As any machine learning practitioner will attest, creating a large and diverse dataset is not only a significant hurdle, but also a critical step towards improving a model's accuracy. This is especially true in image recognition tasks such as gesture recognition, where collecting a varied dataset covering a broad range of backgrounds, lighting conditions, and skin tones can be a challenge. This process may entail working with a large and diverse group of individuals to obtain the necessary images, adding another layer of complexity to the process.

In addition, collecting real-world data poses a significant challenge, as it may be challenging to capture enough data that covers all possible scenarios and variations. This can result in a biased or incomplete dataset that negatively impacts the accuracy and effectiveness of the resulting model. Conversely, AI-generated data can provide solutions to many of these obstacles. By utilizing an AI model such as DALL-E to produce synthetic data, developers can quickly and effortlessly create an extensive and diverse dataset that encompasses a wide range of scenarios and variations. This can help enhance the accuracy and generalization of the model while also reducing the amount of time and resources required for data collection. However, utilizing AI-generated data does have its own set of difficulties. For example, the generated data may not always be entirely realistic, potentially resulting in overfitting and reduced accuracy. We can easily generate data with OpenAI API, although the quality of the synthetic data depends on the prompt input.

Creating a Synthetic Dataset With DALL-E for Gesture Recognition

To produce the images, we leveraged the OpenAI API to communicate with DALL-E. We presented DALL-E with a prompt that outlined the desired image we wished to create. The first step to access DALL-E is to set up the OpenAI library and your API-key. To generate your API key, take a look into the OpenAI documentation. Then we need to prepare the prompt. Here, it is advantageous to describe the images that will be used as detailed as possible. But also bring it to the point. We used prompts like “a gesture of a hand showing a peace sign in a random angle” and “a gesture of a hand showing a fist in a random angle”

For our specific use case, these prompts are totally fine, but keep in mind the more complex your problem the more detailed the prompt must be. If you want a bit more guidance and inspiration for your prompt, we recommend this Editor Guide for DALL-E. It explains the nitty-gritty details!?

That way, we generated 350 images for each class, resulting in a total of 700 images. We then split the dataset into two parts: 250 images for training and 100 images for validation.

Here are some examples of the images DALL-E created for us:

Hand gestures with different backgrounds

While some of the images produced are virtually indistinguishable from those captured by humans, not all of them are perfect. Some images may show signs of being AI-generated, such as those in the following examples:

In our case, we chose to keep these slightly imperfect images in our dataset to demonstrate that even with an uncleaned dataset, acceptable results can be achieved for a prototype. The process of generating the synthetic dataset with DALL-E was quick and easy, taking only a few minutes to generate a sufficient number of images. By using synthetic data, we were able to create a highly controlled dataset with specific variations in background, lighting, and other factors that we wanted to include in our training data.

In the following section, we'll explain how we further improved the dataset by using data augmentation, a technique that enabled us to generate additional images with variations in lighting, orientation, and other factors.

Data Augmentation

Data augmentation is a method to artificially increase the size of a given dataset. To do this, we use different operations to create new images from the given images. Examples are flipping the image, increasing contrast, or rotating the image. By doing so, it is possible to generate 10 images or more from one given image. To apply data augmentation to our dataset, we can easily use the image data generator from Keras. Simply by importing the Keras preprocessing library, you can create the ImageDataGenerator and add your preferred augmentation. In our project we decided to choose a rotation of up to 30 degrees, a zoom range of up to 20%, and a horizontal flip.?

Es wurde kein Alt-Text für dieses Bild angegeben.

Now we can perform the augmentation by means of a line:

领英推荐

Practical applications of Generative AI for data…

N-iX 8 个月前

AutoGluon: Empowering AI with Automated Wizardry

360DigiTMG 1 年前

AI: from a business perspective

Digiata 9 个月前

The arguments are relatively clear, x is the image we want to augment, batch_size is the batch of images, in our case we parse one by one. And the last two arguments specify the save location of the created images, we save these as jpg.

In the next section, we will show you how we trained a model using transfer learning without thinking about the architecture of the model.

Creating a Model With Transfer Learning

Transfer learning has been a game-changer for deep learning applications. By using a pre-trained model as a starting point for a new task, developers can save time and resources, while still achieving high accuracy.? In computer vision, pre-trained models like VGG, MobileNet and ResNet can be used as a starting point for new tasks like image classification or object detection.

Training a deep neural network from scratch can require a large amount of data and computing power, which may not always be feasible. With transfer learning, however, we can leverage pre-existing knowledge from models that have been trained on vast amounts of data, reducing the need for extensive training.

Another advantage of transfer learning is that it can improve the performance of a model on a new task. By using a pre-trained model's knowledge, we can obtain better results with less training data, leading to faster convergence and better generalization.

By utilizing TensorFlow, many transfer learning models are easily accessible. For our project, we used the MobileNet model One of its key features is the use of depth-wise separable convolutions, which maintain high accuracy while reducing the number of parameters and computations required. This makes it an ideal model for applications requiring real-time processing, such as image and video classification.

We can include the model like this:

That way, we can download the model and give it the input_shape of our data. Setting include_top=False effectively removes the final layer of the pre-trained MobileNet model, and returns the output of the last convolutional layer instead. This output can then be used as input to another layer or model that we define ourselves, depending on the task at hand.

The next step is to specify that we don't want to train any layer of the model:

And then we can add our layers to the model, and also include the hyperparameters we think would do the best job.

We will not go into the details here, as it would go beyond the scope of this article. After a successful training, we can test the model.

In our project, we were able to achieve a high level of accuracy using our approach, even though our dataset was not cleaned. To provide a comparison, we also went through the process of cleaning the dataset. This involved replacing around 70 images out of a total of 700 with newly generated DALL-E images that had no errors. This resulted in a small increase in accuracy, from 96% to 97%. However, it's worth noting that DALL-E may not always generate perfect images for a specific use case, and manual cleaning should be considered in cases where high accuracy is required, such as in medical diagnosis. For our project, though, the uncleaned dataset was perfectly sufficient.

Conclusion

In this post, we have explored the potential of AI-generated data for deep learning, specifically in the area of gesture recognition. By utilizing DALL-E to create our dataset, we were able to achieve 96% accuracy rate, which improved to 97% with a cleaned dataset. This was made possible by the high level of diversity and realism in the dataset, enabling us to efficiently train a gesture recognition model with minimal effort. The use of synthetic data generated by AI models like DALL-E has the potential to reduce the time, resources, and effort required for data collection, while still providing a diverse and representative dataset for training. Additionally, data augmentation techniques can further improve the dataset and enhance the model's generalization capabilities. Transfer learning also allows us to leverage pre-trained models, leading to faster convergence and better performance on new tasks. Together, these techniques demonstrate the power and potential of AI-generated data in a wide range of fields beyond gesture recognition.

Raghad Farhud

Web Developer | Computer Science

11 个月

great article! I have a Deep learning graduation project and my team and I are frustrated with finding a proper dataset for Arabic sign language, this will help solve our problem. thanks for sharing the knowledge.

Bitty Samuel Varghese

Software Engineer | M.Sc (Information Engineering & Computer Science) | M.Tech(Computer Science & Engineering)

1 年

The concept of creating data sets by an artificially intelligent robot is certainly worth saving time. The studies really open up new ways to achieve the goal. I started reading out of curiosity, but by the end I have a clearer idea of what you are doing. Great work.?

1 次回应

查看更多评论

要查看或添加评论，请登录

itemis的更多文章

2024年4月16日

Das ?Hier-Gefühl“ – Wie optimale Arbeitsumgebungen unser Schaffen pr?gen – Folge 18 des itemis PODCAST

Heute ist es so weit. Wir feiern das Finale der ersten itemis PODCAST Staffel! In der letzten Folge der Staffel ist…

4 条评论
2024年3月19日

Day by Day - Dem Stress mehr als einen Schritt voraus - Folge 17 des itemis PODCAST

Dauerstress ist ein Zustand, mit dem viele Menschen einfach leben. Das ist allerdings nicht gesund und führt zu…
2024年3月5日

Digital Naives - Früh übt sich - Folge 16 des itemis PODCAST

Programmieren ist ein Skill geworden, der in der heutigen Berufswelt immer wichtiger wird. Es kann kaum wichtiger sein,…
2024年2月20日

Step by Step - Das sportlichste Unternehmen Deutschlands - Folge 15 des itemis PODCAST

In der heutigen Arbeitswelt, in der das Homeoffice immer mehr zur Norm wird, rückt die Gesundheit der Mitarbeitenden…

1 条评论
2024年2月6日

Innovation (b)locken - Wie das eigene Kerngesch?ft zur Gefahr werden kann - Folge 14 des itemis PODCAST

Wenn’s l?uft, gibt es keinen Grund, etwas zu ?ndern, oder? Diesem Trugschluss erlagen viele gro?e Unternehmen und am…
2024年1月23日

CEO-Aktivismus - über Selbstdarsteller und das Einstehen für seine überzeugungen - Folge 13 des itemis PODCAST

Ein guter CEO h?lt sich aus der ?ffentlichkeit raus - das ist zumindest die Annahme, die in den 1990er Jahren…
2024年1月9日

Weg vom Wasserfall - Wie Large Language Models den Ingenieur von Morgen entlasten - Folge 12 des itemis PODCAST

Fachkr?ftemangel, wohin das Auge reicht und ewige Entwicklungszyklen in der Automobilindustrie. H?rt man auf Wolfgang…

1 条评论
2023年12月5日

Das Wir-Gefühl - über Entwicklerteams, die Arbeit von zu Hause und gemeinsame Events - Folge 11 des itemis PODCAST

Eingespielte Entwicklerteams in Projekten einsetzen - es k?nnte so einfach sein. Die Vorteile liegen auf der Hand: Mehr…

5 条评论
2023年11月21日

"Die Zukunft der Mobilit?t" - itemis PODIUM und "Gute Hacker, schlechte Hacker" - Folge 10 des itemis PODCAST

Die Zukunft der Mobilit?t Am kommenden Donnerstag, 23. November, findet das itemis PODIUM wieder statt! Dieses Mal wird…
2023年11月7日

Digital Twins - über die Probleme der Automobilindustrie - Folge 9 des itemis PODCAST

In Folge 9 des itemis PODCAST ist der Gründer und Vorstand von itemis, Wolfgang Neuhaus, zu Gast. Er spricht über das…

2 条评论

See all articles

Efficient Data collection with AI-Generated Data from DALL-E

itemis

Innovation for Software and Cybersecurity

Summary

Data Collection

Creating a Synthetic Dataset With DALL-E for Gesture Recognition

Data Augmentation

领英推荐

Creating a Model With Transfer Learning

Conclusion

itemis的更多文章

社区洞察

其他会员也浏览了

Machine Learning Projects: Overcoming Key Challenges in Implementation

Discriminative AI vs Generative AI: Keys to understanding them

How to Build an AI App: A Step-by-step Guide

RAGOps - a blueprint for production-grade AI systems (pt. I)

Overcoming Challenges In Implementing AI and Machine Learning

Non-supervised AI for SMEs: No artificial intelligence without human talent.

How Machine Learning is Changing the Game for Tech Companies

Learning the Basics: A Quick Guide to Data Labeling

AI Trends | Productivity Tools | Learning Resources | Useful Prompts

Navigating the Challenges of Synthetic Data in AI and Machine Learning

Summary

Data Collection

Creating a Synthetic Dataset With DALL-E for Gesture Recognition

Data Augmentation

领英推荐

Creating a Model With Transfer Learning

Conclusion

itemis的更多文章

Das ?Hier-Gefühl“ – Wie optimale Arbeitsumgebungen unser Schaffen pr?gen – Folge 18 des itemis PODCAST

Day by Day - Dem Stress mehr als einen Schritt voraus - Folge 17 des itemis PODCAST

Digital Naives - Früh übt sich - Folge 16 des itemis PODCAST

Step by Step - Das sportlichste Unternehmen Deutschlands - Folge 15 des itemis PODCAST

Innovation (b)locken - Wie das eigene Kerngesch?ft zur Gefahr werden kann - Folge 14 des itemis PODCAST

CEO-Aktivismus - über Selbstdarsteller und das Einstehen für seine überzeugungen - Folge 13 des itemis PODCAST

Weg vom Wasserfall - Wie Large Language Models den Ingenieur von Morgen entlasten - Folge 12 des itemis PODCAST

Das Wir-Gefühl - über Entwicklerteams, die Arbeit von zu Hause und gemeinsame Events - Folge 11 des itemis PODCAST

"Die Zukunft der Mobilit?t" - itemis PODIUM und "Gute Hacker, schlechte Hacker" - Folge 10 des itemis PODCAST

Digital Twins - über die Probleme der Automobilindustrie - Folge 9 des itemis PODCAST

社区洞察

其他会员也浏览了

Machine Learning Projects: Overcoming Key Challenges in Implementation

Discriminative AI vs Generative AI: Keys to understanding them

How to Build an AI App: A Step-by-step Guide

RAGOps - a blueprint for production-grade AI systems (pt. I)

Overcoming Challenges In Implementing AI and Machine Learning

Non-supervised AI for SMEs: No artificial intelligence without human talent.

How Machine Learning is Changing the Game for Tech Companies

Learning the Basics: A Quick Guide to Data Labeling

AI Trends | Productivity Tools | Learning Resources | Useful Prompts

Navigating the Challenges of Synthetic Data in AI and Machine Learning