Differentially Private Synthetic Text Data, Enhancing RAG Models, and much more

Differentially Private Synthetic Text Data, Enhancing RAG Models, and much more

Hello and welcome to the Gretel Epoch, a roundup of synthetic data developments, community highlights, and privacy-first generative AI insights from our team.?

What We’re Building

Combining Differential Privacy with Synthetic Text Generation: Our latest breakthrough in scaling differentially private text generation minimized performance differences between synthetic and real datasets to just 1%. This marks a significant step toward bringing differential privacy to enterprise ML pipelines, and safely unlocking the last mile of value for generative AI.

Figure 1. Comparing model accuracy across training epochs.

Gretel Transform v2: Our new Transforms model (in beta) offers the flexibility to detect and define custom transformation rules. It’s the perfect tool for effectively de-identifying datasets to meet HIPAA policies or other organization-specific data formats.

Connectors: New Gretel native connectors? with Azure Blob Storage, Google BigQuery, and Oracle Database allow you to now take advantage of our automated end-to-end workflows with these popular data storage locations.?

Real-Time Inference with Gretel Tabular LLM: We've launched an Inference API for real-time responses to your queries, as well as a new SDK that lets you generate synthetic text and tabular data with only a prompt and 3 lines of code. These community requested features have been an absolute game changer for our partners. Learn how to leverage these capabilities to create diverse LLM training data.

Gretel in the Wild

Fast Tracking RAG Model Evaluation: We hosted a great workshop with Samuel Kemp, Principal AI Platform Architect at Microsoft, on effective enterprise strategies for leveraging high-quality synthetic data to improve RAG model performance. If you are trying to build more efficient models that are battle-tested for production deployments, this workshop is for you.?

Figure 2. Similar to the MLOps lifecycle, enhancing a RAG model with synthetic data results in continuous improvements across the different stages of data collection, fine tuning, evaluation, and testing.

Transferred Learnings: Gretel’s CPO and co-founder, Alex Watson, joined the Transferred Learnings community of AI scientists, researchers, and practitioners to discuss emerging AI trends. The discussion focused on training and fine-tuning large language models (LLMs) with synthetic data using Gretel Tabular LLM.?

Synthesize 2023 Redux: Last year, Gretel hosted the inaugural synthetic data developer conference featuring leaders from Google, Illumina, NVIDIA, Riot Games, Snowflake, Roche, Unity, Cohere, AWS, and more, showcasing advancements in synthetic data generation.?

What We’re Reading

Synthetic Data in the EU AI Act: The final draft text of the EU AI Act includes guidelines for using synthetic data prior to processing sensitive data for things like de-biasing and securing models, as well as to participate in regulatory sandboxes.?

A Bottleneck in Deployment: Enterprises racing to adopt generative AI face a major challenge: security. Complex AI models require sensitive data, autonomous learning, and operate in error-prone environments. Meanwhile, cyber threats are increasing, particularly targeting LLMs. Existing security tools are inadequate, hindering AI adoption. Thankfully, data teams are applying a mix of new startups tools, including synthetic data, to ensure the modern AI stack is resilient in the face of these growing concerns.?

Figure 3. Key barriers to generative AI adoption.

Thanks for reading. If you have questions, or comments, join us in the Synthetic Data Discord community.?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了