Gretel's Tabular LLM, Synthetic Data Accelerator, and much more
Hello and welcome to the Gretel Epoch, your essential roundup of product developments, community highlights, and privacy-first generative AI insights from our team.?
What We’re Building
Tabular LLM ??: the first AI system designed for large-scale tabular data generation. The application excels at generating, augmenting, and editing datasets through natural language or SQL prompts. It’s like a CoPilot for your data.
Model Playground ??: a delightful UI for getting tabular results from our proprietary LLM, and engaging with our powerful GPT model. From cold start to contextually-rich text and tabular datasets in a few keystrokes.
Workflows ??: automated, end-to-end solutions for integrating synthetic data generation capabilities into your existing pipelines using scheduling, cloud storage, database, and data warehouse connectors, and no-code configurations.
For a deep dive into all these developments, watch our inaugural Gretel Demo Day event . Need data now? Check out our guide on deploying Gretel in AWS , or try our popular (nearly 1M downloads ??) Python SDK to generate synthetic data in just 3 lines of code .
Gretel in the Wild
Gretel + AWS Synthetic Data Accelerator : to address the soaring demand for safe, accurate, and timely training data for AI systems, we launched a new accelerator program with AWS to support teams building responsible AI systems. The program is open for a limited time to startups and enterprises in financial services,?healthcare and life sciences, and the public sector. Apply now. ?
领英推荐
The Cognitive Revolution Podcast : interview with Gretel cofounder and CPO Alex Watson that explores agent planning architectures, reinforcement learning for intentionally generating more diverse and representative synthetic data, differential privacy techniques to prevent memorization and exposure of private data, and much more.
Other Sightings: we joined GitHub’s Copilot Program , discussed how synthetic data could address the “right to be forgotten” laws and the issue of having to retrain models, participated in the workshop on Synthetic Data at the 4th ACM Intl. Conference on AI in Finance , and were named one of the Top 100 AI Business Tools .
What We’re Reading
PETs & Policies: President Biden’s?Executive Order called for an acceleration in the federal adoption of privacy enhancing technologies, like synthetic data and differential privacy. Across the pond, the draft language of the EU's AI Act, particularly regarding model testing and training,?recognizes synthetic data as separate but equivalent to anonymized data .
Synthetic Sandboxes: Policymakers are already leveraging synthetic data to enhance regulatory oversight. For instance, the UK’s Financial Conduct Authority (FCA) used synthetic data to support its?digital sandbox , which offered startups and enterprises a controlled regulatory environment for stress testing and evaluating their innovative financial technologies before deploying to market.?Synthetic data was identified as the most valuable feature ?by participants.
Synthesizing SLMs: Two small language models (SLMs) – Microsoft’s Phi 1.5 & IBM’s Granite – which rival the performance of LLMs on many tasks, both used highly curated synthetic datasets for training. These models are cheaper to build, more environmentally friendly, and are less inclined to produce biased outputs, compared to some of their larger counterparts that are trained on the open web. These are promising results for companies with less resources that want to build both responsible and competitive AI solutions.
Thanks for reading. If you have questions, or comments, join us in the Synthetic Data Discord community .