AWS & Azure Integrations, Privacy Innovations, and More
Welcome to Gretel Epoch—your monthly brief on the synthetic data economy and privacy-first AI.?
What We're Building
Deploy Gretel on AWS & Azure: Our partnerships have reached new milestones: AWS now integrates Gretel Navigator with Amazon Bedrock, enabling developers to create safe, scalable synthetic data for training AI to understand and execute tool commands.?
At Microsoft Ignite, CEO Satya Nadella highlighted our role in breaking data bottlenecks: "There is no AI without data." EY's Global Client Technology AI Lead, John Thompson, emphasized the value of privacy-protected synthetic data, stating: "EY is leveraging synthetic datasets with 99% accuracy to fine-tune Azure OpenAI models while safeguarding sensitive financial data and meeting regulatory standards."
Transform Sensitive Data: NavFT DP, our new differential privacy solution, safeguards numerical, categorical, free-text, and event-driven data. With formal mathematical guarantees, it preserves privacy without sacrificing utility.
Evaluate Privacy Risks: PII Replay, our latest metric, quantifies and reduces sensitive data risk in synthetic datasets. Complementing tools like Membership and Attribute Inference protections, it empowers AI teams to measure and minimize privacy risks effectively (try the notebook).?
Navigate Data Bottlenecks: Navigator Data Designer simplifies creating domain-specific datasets, streamlining refinements to enhance AI performance. Its Sample-to-Dataset?feature fixes class imbalances and generates rich, diverse datasets with minimal effort.
Gretel in the Wild
Detect PII/PHI: We’ve integrated Gretel Synthetics into the lightweight GLiNER models, enabling them to extract any entity types from text with enhanced privacy and performance. For PHI/PII detection, these synthetically-enhanced GLiNER models offer a cost-effective, flexible alternative to general LLMs. The datasets and models are open-source—explore them today.
Reason Better: Recent advances in LLM reasoning have tackled complex math problems, but the paper "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models" by Mirzadeh et al. highlights gaps in symbolic reasoning. Our review suggests these gaps reflect the nuanced interplay of architecture, training data, and task complexity.
领英推荐
Upcoming Events: If you're interested in learning more about Gretel, we're speaking at the SmallCon virtual conference this Wednesday. Register for free to join.
What We're Reading
Science: Alibaba's Marco-o1 model advances reasoning AI by tackling complex, open-ended problems with iterative techniques like Chain-of-Thought fine-tuning and Monte Carlo Tree Search. Combining synthetic instructions with CoT datasets, it highlights how synthetic data is pushing AI’s reasoning boundaries.?
Business: The Qualtrics Market Research Trends Report shows 71% of researchers expect synthetic data to dominate workflows by 2025, enabling faster insights while addressing privacy and data scarcity.??
Policy: India is prioritizing cost-effective, domain-specific AI solutions over massive LLMs. Leaders like Infosys' co-founder Nandan Nilekani advocate using existing models and synthetic data to build smaller, tailored systems that meet local needs:
"Our goal should not be to build one more LLM—let these big boys in the Valley do that… spend $50 million each on these. We will use that stuff to train new things, create synthetic data to build very small language models quickly, and train them on appropriate data. That’s the approach we’ll take."
Thanks for reading.
If you’re interested in becoming a synthetic data designer and building your own privacy-preserving AI models, check out Gretel University (GU) for end-to-end tutorials and lessons for learners of all skill levels.
Join more than 1,600?other developers, engineers, data scientists, and privacy fans in the Synthetic Data Community on Discord.?
Cheers, Gretel