Part Three: Navigating Third-Party AI Model Risks

Part Three: Navigating Third-Party AI Model Risks

This is part three of a four-part series exploring key challenges and solutions in artificial intelligence governance. In Part One, we examined the importance of ensuring that the data used for AI training complies with privacy notices, contracts, and regulations. Part Two covered challenges in ensuring that any given model is not reused or extended in ways that violate compliant use. Now, we turn to a new challenge: the risks associated with using third-party AI models like those from OpenAI and Anthropic.

Traditional machine learning models are typically developed and deployed internally within a company's infrastructure, giving the company control over the data and the training process. In contrast, the use of third-party large language models (LLMs) introduces new challenges, as data is sent to external model providers.

Ensuring sharing data with third party AI models is allowable

In Part One of this series, we covered the foundational steps for determining whether data can be used for training AI models. When data is shared with third-party models, these same principles must be applied at an additional level of abstraction—is the sharing of data with a third party for the outlined purpose allowable? Allowable use is defined by the purpose of the model, applicable rules and regulations, user consents, and third-party contract data rights provisions.

Example: Ensuring Consent to Share with 3rd Party Models

Silverstone Bank, a growing neobank in San Francisco, had been using internal machine learning models for fraud detection, but the results were subpar. As the bank faced increasing pressure to improve its fraud detection systems, the decision was made to implement third-party LLMs to enhance their fraud detection and customer verification models by analyzing unstructured data sets such as service tickets, emails, chat logs, and phone call logs. The legal team, led by senior counsel Emily Hayes, was tasked with ensuring that the bank’s customer data could be shared with the LLM vendor.

Silverstone had collected a variety of sensitive personal data over several years that would aid the LLM. In 2023, Emily Hayes anticipated that they might want to share customer data with third-party LLM providers for AI model training. She introduced new language in the privacy notice that allowed Silverstone to share data with third-party AI model vendors, rather than limiting the use of data to internal models only. After 2023, new users explicitly consented to this sharing.

Emily, however, failed to ensure that Silverstone segmented the pre-2023 and post-2023 data to maintain a clean dataset allowable for training. Unfortunately, Silverstone's data was heavily commingled, mixing pre- and post-2023 data.

With the clock ticking and the firm eager to deploy LLMs to improve fraud detection, Hayes struggled to find a legal pathway forward. The task of segmenting the data to separate consented and non-consented data proved too costly and time-consuming to be worth the effort. Without proper consent, Silverstone’s ambitious AI plans faltered, leaving the bank at a competitive disadvantage in the fight against financial fraud.

Upon reflection of the poor outcome, Emily’s ideal solution would be a piece of software that could automate the re-consent process, ensuring users re-consent anytime Silverstone expanded the ways in which personal information was used. Additionally, she envisioned a tool that could track who had and hadn’t consented to new terms and automate the segmentation of data for proper use across various purposes. Ideally, she would have been able to declare the purpose—“share the data with a third-party AI vendor for fraud protection”—and then click a button to provide data scientists with an appropriate dataset that respected all requirements.

Sharing sensitive and proprietary data with third party model developers

In 2023, major companies like JPMorgan, Amazon, Verizon, and Accenture took significant steps to restricted?employee use of large language models (LLMs). The underlying issue was a lack of trust: these organizations doubted that employees, at scale, could consistently discern and withhold sensitive or proprietary information.

The specific concern was that data entered into prompts could be used for training and, in turn, appear in responses provided to other users. These restrictions were well-founded, as employees were using free-tier LLMs like ChatGPT, whose business model relied on user prompts to improve the model. Essentially, because the service was free, the data provided in prompts became the price of use.

Faced with this challenge, companies typically pursued one of three approaches:

Not surprisingly, most organizations chose option three because of the opportunity to integrate their own datasets into models to make them richer and more accurate. Option three also provided better tools to control what data entered models, how third-party model developers used that data, and how the models themselves were applied.

However, approach three also introduces its own complexities. Organizations now need to carefully manage and enforce the segmentation of their data for training multiple models built for various purposes. This segmentation has to align with privacy notices, business rules, user consents, regulations, and contractual restrictions. Additionally, they have to ensure that the models are not repurposed for impermissible uses over time.

Example: Multi-Model Segmentation

Sarah Martinez, General Counsel at BrightWave, a Fortune 500 consumer goods company, was initially excited when the company decided to implement a multi-model approach to integrate LLMs across various departments. The idea of segmenting data for specific purposes and controlling access based on the use case seemed like the perfect solution to balance innovation with strong data governance.

At first, the rollout was slow, and Sarah’s team ensured that the right models had access to the appropriate data. But as demand for LLM-powered insights grew, BrightWave’s plans expanded significantly. By the end of 2024, the company intended to have 18 LLMs in production, with another 12 planned for 2025. The rapid pace and scale of expansion strained Sarah and her team’s ability to keep up.

Furthermore, as more departments—from marketing to logistics—wanted to integrate LLMs into their operations, more people demanded access to ask the models for insights. Sarah quickly realized that traditional access controls weren’t scaling well enough to manage who was interacting with the models and why. Sarah felt like her team spent too much time each day managing permissions rather than focusing on more impactful work.

Governance of their LLMs had become a full-time job for three people, and Sarah had no dedicated headcount for the task. As her team struggled to keep up with the increasing volume of models and related governance tasks, Sarah became very concerned that the company was accruing significant compliance risks. Furthermore, as her backlog of LLM tasks grew, her team became a major blocker to the speed of innovation—an issue she had always sought to avoid.

Sarah’s dream scenario was to implement an automated segmentation system—one that could route appropriate data to the appropriate model based on the context under which the data existed, the rules for its use, and the purpose of the model. Sarah also wished that, on the third-party side, there was a tool that would require users to state a purpose when using the model so her team could track compliant use of each model.

Conclusion

As companies integrate third-party AI models, managing the risks of data misuse and impermissible purposes becomes more complex. It's critical that businesses track and monitor the purposes for which each model is used and ensure the data flowing into these models is compliant with contracts, privacy rules, and regulations. Without an effective system of record and automation, businesses are exposed to significant governance risks and slower adoption of AI models.?

At Tranquil Data, we’ve built a solution that simplifies the governance of third-party AI models. Our platform automates the assurance that the data feeding models is permissible and that models are only used for their intended purposes. Reach out to learn more about how we can streamline your third-party AI model governance at [email protected].

要查看或添加评论,请登录

Shawn R. Flaherty的更多文章

社区洞察

其他会员也浏览了