Part One: Deep Dive into Ensuring Data Used for AI Training is Compliant

Part One: Deep Dive into Ensuring Data Used for AI Training is Compliant

This article is the first in a five-part series, building on the positive feedback from the previous article, Unlocking AI’s Potential Amid Growing Complexity. The series will explore four key challenges companies will face in ensuring their artificial intelligence (AI) initiatives are prepared to meet future challenges.

This installment focuses on the common challenges of ensuring that data used to train AI models is compliant. Subsequent articles will address: (2) ensuring models have a defined allowable purpose and that use is limited to that purpose; (3) risks associated with third-party models; and (4) the need for transparency in all aspects of AI data usage.

Ensuring that data used for training is allowable involves several key challenges: (1) securing proper user consent; (2) navigating regulatory frameworks that restrict how data can be used for model training; and (3) addressing contractual constraints imposed by third-party data rights in contracts.

Each section will present a specific real-world scenario illustrating when data can and cannot be used, highlighting the critical need for automation and transparency.

Consent: The Cornerstone of Data Usage

Data protection regulations, such as the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and guidance from the Federal Trade Commission (FTC), emphasize that consent for data usage must be “freely given, specific, informed, and unambiguous.” Beyond regulators, the public is paying close attention to how companies handle their data and use it for AI, as seen in recent backlash against companies like Slack, Zoom, DropBox, and LinkedIn. These companies failed to be transparent about how user data was used to train AI models and gave users little choice regarding its use for training.

The best practice for companies leveraging AI is to ensure that their privacy notices are explicit about the use of customer data for training, including which data elements are used and for what purposes. Customers should also be given a choice in the decision and should be required to opt in rather than opt out.

If the privacy notice is silent on training algorithms and requires an update, organizations should establish a process to both capture re-consent and ensure that no user data is used for training until re-consent is obtained. For more details, see this article on Roadmap to Meet the New FTC Affirmative Express Consent Mandate, and this article on re-consent.

Example: Consent and Re-consent

Mary O’Connor is the newly appointed Chief Privacy Officer of FinRex, a large financial technology company. One of her first tasks is to conduct an audit of the company’s privacy practices. During her audit, Mary discovers the following:

  • The most recent privacy notice update was in 2022, which included a clause stating that users were expected to “regularly review the privacy notice to stay informed of changes” and that “continued use of the product constitutes consent to any updates.”
  • The 2022 update also introduced new language permitting the use of algorithms on user data to “improve services”

Mary then examines the company’s internal policies and realizes there is no formal documentation or records of training that reflected the 2022 change. In addition, when interviewing the data science team, she discovers that the AI models are being used not only to improve services, but also to create targeted advertisements.

From her findings, Mary concludes that:

  • Pre-2022 users were not properly informed of the policy change and did not provide explicit consent for their data to be used in AI model training
  • Both pre-2022 and post-2022 users have had their data used for AI training, and the company currently has no effective system to differentiate between these two groups
  • Users are unaware that their data is being used to train models for advertising purposes, which violates both transparency and consent requirements

To bring the company back into compliance, Mary proposes the following steps:

  • Immediately halt all AI model development to prevent further unauthorized use of data
  • Prompt all users — upon their next sign-in — with a clear opt-in choice, allowing them to explicitly consent (or decline) to the use of their data for AI model training, for both improving services and for marketing purposes
  • Create real-time segmentation to track which users have and have not provided consent for each purpose
  • Resume model training only for those purposes where valid consent has been obtained from users

Mary receives significant pushback from the engineering, both because it will stop AI model development in the short-term, and her recommended changes are technically difficult and will cause significant disruption to their product roadmap. Mary and the CTO escalate the issue to the CEO, who agrees with Mary that user trust is paramount and that they need to start fresh. The changes will cause a significant technical disruption, but the CEO sees no other choice.

As a new employee, Mary has used significant social capital to get AI development back on track. However, she is concerned that, given the company’s current size and the growth trajectory of the data science team, she may not be able to prevent this from happening again.

Mary’s dream scenario is to have the ability to author policies that automate the enforcement of proper data usage based on user consents. With these policies, she envisions automating the assurance that engineers are only building models that align with her guidelines. Additionally, she would like to automate the re-consent process whenever the privacy notice changes materially, and to track all of this information in a non-technical system of record.

Regulatory Frameworks: Navigating Compliance

All industries have regulations that limit when user data can be used for AI. Specifically, these regulations restrict the types of data that can be utilized and the purposes for which that data can be used. Some examples include:

Example: HIPAA

Sarah Thompson is the Chief Compliance Officer at a health system called Generous Health. She is responsible for ensuring that the organization’s use of artificial intelligence (AI) complies with regulatory standards. Under HIPAA, covered entities do not require explicit patient consent when using data for payment, treatment, or healthcare operations.

Sarah interviews the data science team and uncovers several potential issues. First, the team has developed an AI tool designed to predict which patients are likely to miss follow-up appointments. While this initiative is framed as enhancing patient engagement, which could be considered a part of treatment, Sarah determines that its actual aim is to automate what is currently a manual reminder process and to drive additional revenue by avoiding scheduling gaps.

Second, the AI models analyze patient data to identify opportunities for increased revenue through elective procedures. While the team justifies this use by arguing that it is for treatment, Sarah sees the true goal as driving additional revenue, since the procedures are elective.

Sarah is concerned that without real-time visibility into which models are deployed, their specific purposes, and the data being utilized, these compliance risks will persist, jeopardizing both patient trust and regulatory adherence.

Sarah’s dream scenario is to have real-time transparency into the models being developed, their intended purposes, and the data they rely on. With such a system, she could quickly identify which projects need further investigation without the time-consuming and costly back-and-forth between her and the data science team.

Contractual Restraints: Third-Party Data Rights Agreements

Organizations must ensure that they have the necessary rights to use any third-party data for training purposes. This complexity can increase for multi-partner and multi-product companies that have various agreements with different data providers.

Example: Developing an AI Co-Pilot

Jessica Tran is the newly appointed Chief Privacy Officer at DataSphere, a growth-stage enterprise software startup that provides data hosting and analytics services to Fortune 500 companies. With the company’s recent initiative to integrate a co-pilot AI feature into its product offerings, Jessica has been tasked with ensuring that the use of partner data for training complies with all contractual obligations.

As DataSphere embarks on the project, Jessica discovers that the company has amassed a vast repository of data from its clients, all governed by Master Service Agreements (MSAs) that dictate how that data can be used. Unfortunately, DataSphere sells to large enterprises, and nearly all of its contracts use the buyers’ MSAs. This means there is no commonality between the agreements regarding the data rights and obligations.

Upon reviewing the contracts, she finds a complex landscape of data rights provisions. Some clients have expressly prohibited any use of their data for AI training, while others allow it but exclude certain data elements, such as personally identifiable information (PII) and proprietary business data. Additionally, some clients permit data use only for specific purposes.

Jessica learns that even if she reviews all the agreements to understand how the data can be used, aligning the diverse requirements from these contracts into a cohesive dataset for AI training presents a significant technical challenge.

As a result, DataSphere finds itself at an impasse. With extensive data on hand but conflicting contractual requirements, the team is uncertain how to proceed with training the AI without risking noncompliance or breaching existing agreements.

Jessica’s dream scenario is a piece of software that can track their third-party agreements and automate the segmentation of proper use for various purposes. In this scenario, she would like to declare the purpose of “train an algorithm to improve their service” and hit a button that gives her the appropriate dataset that respects all of their agreements.

Conclusion

The real-world scenarios highlighted in this article illustrate common pain points at the intersection of policy requirements and engineering practices in AI development. Addressing these challenges — whether securing proper consent, navigating regulatory frameworks, or managing complex third-party data rights — is essential for fostering trust and compliance in AI initiatives.

As organizations strive to unlock the full potential of AI while adhering to legal and ethical standards, leveraging the right tools is crucial. Tranquil Data offers innovative solutions designed to automate and streamline these processes, ensuring that data usage is transparent, compliant, and aligned with user consents.

To learn more about how Tranquil Data can help your organization overcome these challenges and pave the way for responsible AI development, reach out to us at [email protected]

Ruby Raley

Strategic Sales Leader who GETS Marketing | Growing Revenue | Executive Member @ Pavilion

4 个月

Shawn R. Flaherty What a great start to your series! Your examples are gold: real-world challenges and the difficulties of doing the right thing. I am worried that AI is moving so quickly that many will take short cuts or change policy without any way to track, enforce and audit the policy.

要查看或添加评论,请登录

Shawn R. Flaherty的更多文章

社区洞察

其他会员也浏览了