2022: A Look Back at the Best Year for Synthetic Data Generation (Yet)
Welcome to the Reality Gap newsletter, which focuses on synthetic media and generative AI for computer vision. If you'd like to be notified about the next edition, click "Subscribe" at the top of this page.
The past year has seen significant progress in the field of synthetic data generation.
As the demand for accurate and diverse datasets continues to grow, so too has the need for innovative solutions that can provide high-quality synthetic data.
The trend is clear:
As I look back at this past year, here is the list of my top events that impacted the synthetic data generation industry:
New Tools And Capabilities
New Research
Major computer vision conferences like CVPR, ECCV, and NeurIPS saw an increased volume of research related to synthetic data. This included topics like:
There have been several good paper reviews related to synthetic data:
Synthetic Data Generation Vendors Are Growing and Fundraising
We loved seeing our friends across the SDG industry aggressively hiring and fundraising:
Diffusion Models and Generative AI
And of course, the explosion of popularity of image-generative AI and diffusion models such as OpenAI DALL-E, Midjourney, and Stable Diffusion was the highlight of 2022. This breakthrough will undoubtedly have a significant impact on the synthetic data generation industry and significantly improve synthetic workflows.
What else caught your attention this year? Please share in the comments.
I've reached out to several key players in the synthetic data generation industry to share their reflections on 2022.
"Wow, 2022 was an amazing year for the world of Computer Vision and Machine Learning. I will touch on the incredible progress in the field of Simulated Synthetic data and Synthetic Media.?
In a survey we conducted earlier this year, we discovered that the primary challenge of CV engineers is collecting data. Synthetic data is here to provide the solution to this problem including ground truth labels and granular control for generating the exact data you need for successful AI models. Gartner even stated that by 2024, “60% of the data used for the development of AI and analytics projects will be synthetically generated.” This is basically a full-scale adoption of synthetic data and its promise. With the widespread use of synthetic data, companies can bring their products to market quickly and reliably without having to worry about the ethical issue of privacy. Throughout the year, we’ve proven in our benchmarks that synthetic data works, along with a small amount of real data, in a variety of settings including identifying facial landmarks and in-cabin automotive. Disney Research also recently proved the effectiveness of synthetic data with their new AI tool for re-aging. Microsoft came out with amazing papers on Simulation-based synthetic data for training a wide range of face-focused computer vision tasks. My prediction for 2023 is that we will continue to see the use of simulated synthetic data grow along with the evidence of its effectiveness.
In addition to simulated synthetic data, there has been an explosion in Synthetic Media. DALLE-2, StableDiffusion, Imagen, and many more variants that enable realistic generation of images based on text. This is incredible to see and is really only the first step of many. Synthetic Media generation will reach Audio, Video, 3D Object, and any type of content that we enjoy today. This content will be seamless to generate and customize, at scale. We’re entering a new age of Synthetic Media that began in 2022 and will be expanded greatly in the upcoming years."
Omar Maher , Director of Product Marketing at Parallel Domain
"We are thrilled to see so many customers experiencing great success with synthetic data across a wide range of applications, including L2-5 autonomous vehicles, delivery robots, autonomous drones, and mobile computer vision. It's exciting to see more and more organizations adopting synthetic data not only for training but also for testing their machine learning models.
We are incredibly excited about the possibilities that generative AI opens up for synthetic data. This innovative approach to content generation has the potential to take things to a whole new level, and we are committed to investing in it heavily in 2023. We can't wait to see what we can accomplish with this powerful tool at our disposal!"
Chris Andrews , Chief Operating Officer and Head of Product at Rendered.ai
"For Rendered.ai , 2022 was an incredible year. There are many highlights, starting out with the launch of our platform as a service for synthetic computer vision data in January and then quickly adding to our commercial customer list, including repeat business, which brought us opportunities in diverse industries such as national defense, insurance, and medical imaging.
With my background in 3D, I’ve been excited to see that more and more customers are recognizing that a key value of digital twins is going to be in generating synthetic data to train detection and monitoring systems and that complex AI-driven systems will need many forms of synthetic data.
As we look to the year ahead, the possibilities introduced through Conversational AI and Generative AI to create data are likely to open up whole new opportunities to combine synthetic data for computer vision with synthetic data from more structured AI training domains.
We believe that 2023 will be the year when synthetic data starts to move from curiosity to critical capability and our platform-as-a-service is well-positioned to help customers in diverse computer vision domains as they realize that there is a source for unlimited, simulated training data that has far less cost and environmental impact than real sensor data collection."
Sidney Primas , Co-Founder at Infinity AI (YC W24)
"2022 has been an amazing year for synthetic data. I’m especially excited about the Cambrian explosion of innovation within the generative AI space. This has allowed us to accelerate our roadmap at Infinity AI (YC W24) . A combination of traditional physics-based simulations - for labels and structured API controls - and generative techniques - for infinite variety - gives our customers the best of both worlds.
Synthetic data moves data creation from the analog to the digital world. Whenever that has happened in the past (electronics, photography, etc), there has been a Cambrian explosion of innovation. We see the same thing happening for ML training data today.
Infinity AI launched the Infinity Marketplace, the world’s largest open-source marketplace for synthetic datasets. There are already 1 million free frames that can be used for both research and commercial purposes, and more are added every month. Datasets run the gamut from fitness and robotics to smart retail, industrial safety, and more."
Bartek W?odarczyk , CEO at SKY ENGINE AI
"In terms of AI and data science use and expansion, the year 2022 saw great advancement. It's obvious in the synthetic data industry, that SKY ENGINE AI is building with others in the field. The year has been appropriately dubbed "The Year of Text-to-Anything," with some interesting artwork produced by AI models such as Dalle-2 or Stable Diffusion. As time goes on, we anticipate generative AI to become more accessible and spread into other areas.
SKY ENGINE AI – Synthetic Data Cloud for Vision AI and the Metaverse is also at the forefront of this movement, with generative AI methods accelerating data content simulations and ground truth generation; however, these are for computer vision applications, and generative AI is mostly used to aid in the generation of some content elements. These methods, together with self-supervised learning, constitute the cornerstone of the SKY ENGINE AI cloud – a full-stack platform for data scientists.
As governments and businesses have rapidly pushed toward digitalization, with data driving their operations and decision-making, concerns about data privacy and security have arisen.
This front experienced some progress in 2022. With additional restrictions in place, future breakthroughs in data science and AI are expected to be dependent on the framework around data privacy and security. The SKY ENGINE AI cloud is perfectly suited for enabling privacy-protected data simulations and AI model training and it can further democratize access to training AI data in sensitive domains of medical diagnostics, retail, and behavior tracking or social distancing.
The breakthroughs achieved in the field of data science have demanded data automation, which has been making the rounds for quite some time now. Automation, according to industry analysts, will spread further, with large IT organizations aiming to automate internal processes. Again, SKY ENGINE AI cloud is a technology enabler for that in the Computer Vision industry because its synthetic data simulation engine is integrated on a memory level with popular data science tools such as PyTorch or TensorFlow allowing data generation directly to the deep learning pipeline automating neural nets training tasks. And further integration with other existing data science tools can be seamless.
Finally, the prolonged recession is predicted to have an influence on the data science and AI industries. The extent of this impact will become clear in the following years. However, industry experts and organizations may continue to benefit from high-quality vision AI and industrial metaverse solutions based on synthetic data simulated in the SKY ENGINE AI cloud. When enormous synthetic training datasets are generated at a fraction of the cost of real-world data gathering and labeling, AI business transformation may become a reality.
SKY ENGINE AI has already demonstrated this in a variety of industries, including automotive in-cabin monitoring systems, digital twins for the factory of the future in robotics, warehousing, infrastructure monitoring in telecommunications and energy, defense and homeland security, construction site analytics, maritime and even medical diagnostics. All of these solutions can eventually be built in the SKY ENGINE AI synthetic data cloud, which provides data for the AI model's training and validation in parallel these models can be produced on a single platform.
It remains to be seen which trends will persist in 2023, but the inherent value of synthetic data solutions in vision AI is projected to soar in the next years and SKY ENGINE AI is there to help developers and data scientists create accurate solutions addressing real business needs."
Back to Andrey again. Gil Elbaz , Omar Maher , Chris Andrews , Sidney Primas , Bartek W?odarczyk — thank you for your comments.
And now, let's dive into the news headlines of recent days!
Microsoft's Generative Model for Sculpting 3D Digital Avatars
Microsoft just published a paper on a 3D generative model that uses diffusion models to automatically generate highly detailed 3D digital avatars with realistic hairstyles and facial hair. Avatars can be generated from image or text prompts.
OpenAI's New Diffusion Model for Point Clouds
OpenAI ?has just unveiled Point-E, their newest diffusion model for point cloud generation from text prompts. You can also try a demo on Hugging Face.
Amazon SageMaker Ground Truth Synthetic Data Now Supports Dynamic 3D Environments
Amazon SageMaker Ground Truth now supports the generation of labeled synthetic data for dynamic 3D environments in various industries, including manufacturing, warehouse robotics, food packaging, retail, autonomous mobility, and smart homes, through the use of full 3D scenes, 3D depth maps, multiple cameras, moving objects, and auto-labeled video data.
Cascadeur, a New 3D Animation Software
After almost ten years of development and three years of beta testing, the 3D keyframe animation software Cascadeur has been fully launched and its AI-assisted tools allow animators to efficiently create physically accurate animations.
The Rise of Virtual Influencers
Virtual influencers, who are computer-generated fictional individuals used for marketing purposes, especially on social media, have become popular for retailers like Marks & Spencer and Pacsun to work within their digital campaigns and as extensions of their brands.
Infinity AI Raises $5M for Novel Generative Tools
Infinity AI (YC W24) , a startup that generates automated synthetic training data, announced its $5M seed round this year! The funds will be used to bring the company's novel generative tools, which complement Infinity's existing self-serve API, to market.
Congrats, Sidney Primas and the team!
And that's a wrap for this week!
Here are a few more ways you can learn about synthetic data and generative AI:
Happy holidays! See you next year!
Andrey
Top 8% Financial Advisor & Unit Manager at Sun Life PH
1 年Hi, Andrey, This may be a long shot. But my Facebook account was hacked and I BADLY NEED HELP. I've been in contact with Meta Support via email for several days already but I still could not access my account. Please! I need help :(
Venture Builder, Speaker, CEO, Investor in UAV, Defense Tech, Dual Use tech AI, Zero Emission Cloud, Health tech, Start Up Mentor
1 年Great insights, thanks Andrey Shtylenko!
AI & ML Specialization, xCTO & founder of Datagen
1 年Great newsletter Andrey - my pleasure to take part! 2022 was an incredible year for synthetic data and generative AI ??
Chief Operating Officer and Head of Product at Rendered.AI
1 年Thanks for the post, Andrey Shtylenko, and for continuing to catalog advances and educate the community. Have a great finish to 2022!
CEO & Co-Founder at Monta AI | Generative AI Solutions
1 年As always, the coverage is absolutely amazing! Thank you so much, Andrey, for the shoutout!