2022: A Look Back at the Best Year for Synthetic Data Generation (Yet)
Image generated with the help of OpenAI DALL-E

2022: A Look Back at the Best Year for Synthetic Data Generation (Yet)

Welcome to the Reality Gap newsletter, which focuses on synthetic media and generative AI for computer vision. If you'd like to be notified about the next edition, click "Subscribe" at the top of this page.


The past year has seen significant progress in the field of synthetic data generation.

As the demand for accurate and diverse datasets continues to grow, so too has the need for innovative solutions that can provide high-quality synthetic data.

The trend is clear:

  • More companies are experimenting and productizing solutions enabled by synthetic data.
  • There is an explosion of new tools and capabilities for synthetic media generation, utilizing both legacy 3D engines and recently popularized generative AI approaches for image, 3D, and video synthesis.
  • There is an uptick in research that further drives synthetic data generation but also an increasing volume of research that is enabled by synthetic datasets. Synthetic data is now one of the primary topics in major computer vision conferences like CVPR, ECCV, NeurIPS, and others.

As I look back at this past year, here is the list of my top events that impacted the synthetic data generation industry:

New Tools And Capabilities

  • Unreal Engine has officially released version 5 (and subsequent 5.1) —?adding an array of tools enhancing photorealism and procedural capabilities. It also released a new version of its Metahuman Creator platform allowing for custom head meshes.
  • Unity released Perception 1.0, a toolbox focussed on synthetic data generation which also includes digital humans and homes. Unity has also acquired Ziva Dynamics, the company behind the AI-enabled digital character creation platform.
  • 英伟达 released a host of updates to its Omniverse Replicator SDK for synthetic data generation including the availability of cloud-based rendering.
  • Amazon SageMaker Ground Truth has added the capability of generating labeled synthetic image data.

New Research

Major computer vision conferences like CVPR, ECCV, and NeurIPS saw an increased volume of research related to synthetic data. This included topics like:

  • neural generative models, image and video synthesis
  • 3D from a single image and multi-view
  • body, face, gesture detection, and synthesis
  • scene analysis and understanding
  • sim-to-real domain adaptation

There have been several good paper reviews related to synthetic data:

Synthetic Data Generation Vendors Are Growing and Fundraising

We loved seeing our friends across the SDG industry aggressively hiring and fundraising:

  • Datagen ?raised $50 million in a Series B round led by new investor?Scale Venture Partners, with partner Andy Vitus joining Datagen’s board of directors.
  • Synthesis AI raised $17 million in a Series A round led by 468 Capital with participation from Sorenson Ventures and Strawberry Creek Ventures, Bee Partners, PJC, iRobot Ventures, Boom Capital, and Kubera Venture Capital.
  • Parallel Domain raised $30 million in Series B funding led by March Capital, with participation from return investors Costanoa Ventures, Foundry Group, Calibrate Ventures, and Ubiquity Ventures.
  • Infinity AI (YC W24) announced last week a $5M seed round led by Matrix with participation from founders and operators from companies like Snorkel AI, Tesla, and Google.
  • Scale AI launched its Synthetic data generation offering in February this year.

Diffusion Models and Generative AI

And of course, the explosion of popularity of image-generative AI and diffusion models such as OpenAI DALL-E, Midjourney, and Stable Diffusion was the highlight of 2022. This breakthrough will undoubtedly have a significant impact on the synthetic data generation industry and significantly improve synthetic workflows.

What else caught your attention this year? Please share in the comments.

I've reached out to several key players in the synthetic data generation industry to share their reflections on 2022.

No alt text provided for this image

Gil Elbaz , Co-founder & CTO of Datagen

"Wow, 2022 was an amazing year for the world of Computer Vision and Machine Learning. I will touch on the incredible progress in the field of Simulated Synthetic data and Synthetic Media.?

In a survey we conducted earlier this year, we discovered that the primary challenge of CV engineers is collecting data. Synthetic data is here to provide the solution to this problem including ground truth labels and granular control for generating the exact data you need for successful AI models. Gartner even stated that by 2024, “60% of the data used for the development of AI and analytics projects will be synthetically generated.” This is basically a full-scale adoption of synthetic data and its promise. With the widespread use of synthetic data, companies can bring their products to market quickly and reliably without having to worry about the ethical issue of privacy. Throughout the year, we’ve proven in our benchmarks that synthetic data works, along with a small amount of real data, in a variety of settings including identifying facial landmarks and in-cabin automotive. Disney Research also recently proved the effectiveness of synthetic data with their new AI tool for re-aging. Microsoft came out with amazing papers on Simulation-based synthetic data for training a wide range of face-focused computer vision tasks. My prediction for 2023 is that we will continue to see the use of simulated synthetic data grow along with the evidence of its effectiveness.

In addition to simulated synthetic data, there has been an explosion in Synthetic Media. DALLE-2, StableDiffusion, Imagen, and many more variants that enable realistic generation of images based on text. This is incredible to see and is really only the first step of many. Synthetic Media generation will reach Audio, Video, 3D Object, and any type of content that we enjoy today. This content will be seamless to generate and customize, at scale. We’re entering a new age of Synthetic Media that began in 2022 and will be expanded greatly in the upcoming years."


No alt text provided for this image

Omar Maher , Director of Product Marketing at Parallel Domain

"We are thrilled to see so many customers experiencing great success with synthetic data across a wide range of applications, including L2-5 autonomous vehicles, delivery robots, autonomous drones, and mobile computer vision. It's exciting to see more and more organizations adopting synthetic data not only for training but also for testing their machine learning models.

We are incredibly excited about the possibilities that generative AI opens up for synthetic data. This innovative approach to content generation has the potential to take things to a whole new level, and we are committed to investing in it heavily in 2023. We can't wait to see what we can accomplish with this powerful tool at our disposal!"


No alt text provided for this image

Chris Andrews , Chief Operating Officer and Head of Product at Rendered.ai

"For Rendered.ai , 2022 was an incredible year. There are many highlights, starting out with the launch of our platform as a service for synthetic computer vision data in January and then quickly adding to our commercial customer list, including repeat business, which brought us opportunities in diverse industries such as national defense, insurance, and medical imaging.

With my background in 3D, I’ve been excited to see that more and more customers are recognizing that a key value of digital twins is going to be in generating synthetic data to train detection and monitoring systems and that complex AI-driven systems will need many forms of synthetic data.

As we look to the year ahead, the possibilities introduced through Conversational AI and Generative AI to create data are likely to open up whole new opportunities to combine synthetic data for computer vision with synthetic data from more structured AI training domains.

We believe that 2023 will be the year when synthetic data starts to move from curiosity to critical capability and our platform-as-a-service is well-positioned to help customers in diverse computer vision domains as they realize that there is a source for unlimited, simulated training data that has far less cost and environmental impact than real sensor data collection."


No alt text provided for this image

Sidney Primas , Co-Founder at Infinity AI (YC W24)

"2022 has been an amazing year for synthetic data. I’m especially excited about the Cambrian explosion of innovation within the generative AI space. This has allowed us to accelerate our roadmap at Infinity AI (YC W24) . A combination of traditional physics-based simulations - for labels and structured API controls - and generative techniques - for infinite variety - gives our customers the best of both worlds.

Synthetic data moves data creation from the analog to the digital world. Whenever that has happened in the past (electronics, photography, etc), there has been a Cambrian explosion of innovation. We see the same thing happening for ML training data today.

Infinity AI launched the Infinity Marketplace, the world’s largest open-source marketplace for synthetic datasets. There are already 1 million free frames that can be used for both research and commercial purposes, and more are added every month. Datasets run the gamut from fitness and robotics to smart retail, industrial safety, and more."


No alt text provided for this image

Bartek W?odarczyk , CEO at SKY ENGINE AI

"In terms of AI and data science use and expansion, the year 2022 saw great advancement. It's obvious in the synthetic data industry, that SKY ENGINE AI is building with others in the field. The year has been appropriately dubbed "The Year of Text-to-Anything," with some interesting artwork produced by AI models such as Dalle-2 or Stable Diffusion. As time goes on, we anticipate generative AI to become more accessible and spread into other areas.

SKY ENGINE AI – Synthetic Data Cloud for Vision AI and the Metaverse is also at the forefront of this movement, with generative AI methods accelerating data content simulations and ground truth generation; however, these are for computer vision applications, and generative AI is mostly used to aid in the generation of some content elements. These methods, together with self-supervised learning, constitute the cornerstone of the SKY ENGINE AI cloud – a full-stack platform for data scientists.

As governments and businesses have rapidly pushed toward digitalization, with data driving their operations and decision-making, concerns about data privacy and security have arisen.

This front experienced some progress in 2022. With additional restrictions in place, future breakthroughs in data science and AI are expected to be dependent on the framework around data privacy and security. The SKY ENGINE AI cloud is perfectly suited for enabling privacy-protected data simulations and AI model training and it can further democratize access to training AI data in sensitive domains of medical diagnostics, retail, and behavior tracking or social distancing.

The breakthroughs achieved in the field of data science have demanded data automation, which has been making the rounds for quite some time now. Automation, according to industry analysts, will spread further, with large IT organizations aiming to automate internal processes. Again, SKY ENGINE AI cloud is a technology enabler for that in the Computer Vision industry because its synthetic data simulation engine is integrated on a memory level with popular data science tools such as PyTorch or TensorFlow allowing data generation directly to the deep learning pipeline automating neural nets training tasks. And further integration with other existing data science tools can be seamless.

Finally, the prolonged recession is predicted to have an influence on the data science and AI industries. The extent of this impact will become clear in the following years. However, industry experts and organizations may continue to benefit from high-quality vision AI and industrial metaverse solutions based on synthetic data simulated in the SKY ENGINE AI cloud. When enormous synthetic training datasets are generated at a fraction of the cost of real-world data gathering and labeling, AI business transformation may become a reality.

SKY ENGINE AI has already demonstrated this in a variety of industries, including automotive in-cabin monitoring systems, digital twins for the factory of the future in robotics, warehousing, infrastructure monitoring in telecommunications and energy, defense and homeland security, construction site analytics, maritime and even medical diagnostics. All of these solutions can eventually be built in the SKY ENGINE AI synthetic data cloud, which provides data for the AI model's training and validation in parallel these models can be produced on a single platform.

It remains to be seen which trends will persist in 2023, but the inherent value of synthetic data solutions in vision AI is projected to soar in the next years and SKY ENGINE AI is there to help developers and data scientists create accurate solutions addressing real business needs."


Back to Andrey again. Gil Elbaz , Omar Maher , Chris Andrews , Sidney Primas , Bartek W?odarczyk — thank you for your comments.

And now, let's dive into the news headlines of recent days!


Microsoft's Generative Model for Sculpting 3D Digital Avatars

Microsoft just published a paper on a 3D generative model that uses diffusion models to automatically generate highly detailed 3D digital avatars with realistic hairstyles and facial hair. Avatars can be generated from image or text prompts.

No alt text provided for this image
Project 3D Avatar Diffusion

OpenAI's New Diffusion Model for Point Clouds

OpenAI ?has just unveiled Point-E, their newest diffusion model for point cloud generation from text prompts. You can also try a demo on Hugging Face.

No alt text provided for this image

Amazon SageMaker Ground Truth Synthetic Data Now Supports Dynamic 3D Environments

Amazon SageMaker Ground Truth now supports the generation of labeled synthetic data for dynamic 3D environments in various industries, including manufacturing, warehouse robotics, food packaging, retail, autonomous mobility, and smart homes, through the use of full 3D scenes, 3D depth maps, multiple cameras, moving objects, and auto-labeled video data.


Cascadeur, a New 3D Animation Software

After almost ten years of development and three years of beta testing, the 3D keyframe animation software Cascadeur has been fully launched and its AI-assisted tools allow animators to efficiently create physically accurate animations.

No alt text provided for this image
The animation tool is ready to launch after almost three years of beta testing.

The Rise of Virtual Influencers

Virtual influencers, who are computer-generated fictional individuals used for marketing purposes, especially on social media, have become popular for retailers like Marks & Spencer and Pacsun to work within their digital campaigns and as extensions of their brands.


Infinity AI Raises $5M for Novel Generative Tools

Infinity AI (YC W24) , a startup that generates automated synthetic training data, announced its $5M seed round this year! The funds will be used to bring the company's novel generative tools, which complement Infinity's existing self-serve API, to market.

Congrats, Sidney Primas and the team!


And that's a wrap for this week!

Here are a few more ways you can learn about synthetic data and generative AI:

Happy holidays! See you next year!

Andrey

Jorge Michael Caballes

Top 8% Financial Advisor & Unit Manager at Sun Life PH

1 年

Hi, Andrey, This may be a long shot. But my Facebook account was hacked and I BADLY NEED HELP. I've been in contact with Meta Support via email for several days already but I still could not access my account. Please! I need help :(

回复
PraMax Prasolov

Venture Builder, Speaker, CEO, Investor in UAV, Defense Tech, Dual Use tech AI, Zero Emission Cloud, Health tech, Start Up Mentor

1 年

Great insights, thanks Andrey Shtylenko!

Gil Elbaz

AI & ML Specialization, xCTO & founder of Datagen

1 年

Great newsletter Andrey - my pleasure to take part! 2022 was an incredible year for synthetic data and generative AI ??

Chris Andrews

Chief Operating Officer and Head of Product at Rendered.AI

1 年

Thanks for the post, Andrey Shtylenko, and for continuing to catalog advances and educate the community. Have a great finish to 2022!

Omar Maher

CEO & Co-Founder at Monta AI | Generative AI Solutions

1 年

As always, the coverage is absolutely amazing! Thank you so much, Andrey, for the shoutout!

要查看或添加评论,请登录