Is Big Tech wrong to train AI models on 'messy' public data?
Mark Carey
CEO and Founder @ Delcreo, Inc. DelCreo specializes in helping business and risk leaders implement and manage successful enterprise risk management programs, and offering customized strategic advisory solutions.
The rapid growth of Artificial Intelligence technology in many different parts of a business is complicating contractual agreements with customers, 3rd parties as well as mergers and acquisitions (M&A) transactions. ?Data ownership and licensing related risks may not be currently addressed in existing contract and legal reviews and other due diligence activities including:
领英推荐
Synthetic Data
The reliance on public data for training AI models exposes companies to significant copyright and privacy risks, while synthetic data, though a promising alternative, may face limitations in scope and accuracy when derived from insufficient original data. The push for AI models to handle large-scale data creates operational challenges, including data freshness, regulatory pressures, and the need for real-time insights, all of which necessitate robust and secure technology infrastructures to manage and mitigate these risks effectively.
Ali Golshan, CEO and cofounder of Gretel, which allows companies to experiment and build with synthetic data. Golshan says synthetic data is a safer and more private alternative to "messy" public data, and that it can shepherd most companies into the next era of generative AI development.