Differentially Private Synthetic Text Data, Enhancing RAG Models, and much more
Hello and welcome to the Gretel Epoch, a roundup of synthetic data developments, community highlights, and privacy-first generative AI insights from our team.?
What We’re Building
Combining Differential Privacy with Synthetic Text Generation: Our latest breakthrough in scaling differentially private text generation minimized performance differences between synthetic and real datasets to just 1%. This marks a significant step toward bringing differential privacy to enterprise ML pipelines, and safely unlocking the last mile of value for generative AI.
Gretel Transform v2: Our new Transforms model (in beta) offers the flexibility to detect and define custom transformation rules. It’s the perfect tool for effectively de-identifying datasets to meet HIPAA policies or other organization-specific data formats.
Connectors: New Gretel native connectors? with Azure Blob Storage, Google BigQuery, and Oracle Database allow you to now take advantage of our automated end-to-end workflows with these popular data storage locations.?
Real-Time Inference with Gretel Tabular LLM: We've launched an Inference API for real-time responses to your queries, as well as a new SDK that lets you generate synthetic text and tabular data with only a prompt and 3 lines of code. These community requested features have been an absolute game changer for our partners. Learn how to leverage these capabilities to create diverse LLM training data.
Gretel in the Wild
Fast Tracking RAG Model Evaluation: We hosted a great workshop with Samuel Kemp, Principal AI Platform Architect at Microsoft, on effective enterprise strategies for leveraging high-quality synthetic data to improve RAG model performance. If you are trying to build more efficient models that are battle-tested for production deployments, this workshop is for you.?
领英推荐
Transferred Learnings: Gretel’s CPO and co-founder, Alex Watson, joined the Transferred Learnings community of AI scientists, researchers, and practitioners to discuss emerging AI trends. The discussion focused on training and fine-tuning large language models (LLMs) with synthetic data using Gretel Tabular LLM.?
Synthesize 2023 Redux: Last year, Gretel hosted the inaugural synthetic data developer conference featuring leaders from Google, Illumina, NVIDIA, Riot Games, Snowflake, Roche, Unity, Cohere, AWS, and more, showcasing advancements in synthetic data generation.?
What We’re Reading
Synthetic Data in the EU AI Act: The final draft text of the EU AI Act includes guidelines for using synthetic data prior to processing sensitive data for things like de-biasing and securing models, as well as to participate in regulatory sandboxes.?
A Bottleneck in Deployment: Enterprises racing to adopt generative AI face a major challenge: security. Complex AI models require sensitive data, autonomous learning, and operate in error-prone environments. Meanwhile, cyber threats are increasing, particularly targeting LLMs. Existing security tools are inadequate, hindering AI adoption. Thankfully, data teams are applying a mix of new startups tools, including synthetic data, to ensure the modern AI stack is resilient in the face of these growing concerns.?
Thanks for reading. If you have questions, or comments, join us in the Synthetic Data Discord community.?