登录查看更多内容

How AI Apps Use and Misuse Your Data

Anthony Walsh

The AI PMM | Product Marketing Manager at Atlassian, Meta & venture-backed startups | Accelerating AI development securely & efficiently

发布日期: 2025年2月21日

Although large language model (LLM) and AI application developers have implemented privacy and security protocols—including limited data retention periods, user authentication, data encryption, and removal requests—OpenAI and other enterprise software providers process sensitive information that can be shared with third parties per privacy policies. Data leaks caused by LLMs and AI apps pose significant risks to consumers and businesses.

ChatGPT safeguards and vulnerabilities

OpenAI has lost market share to new and incumbent contenders in recent years, but ChatGPT remains the most popular LLM, boasting over 400 million weekly active users. About 94-95% of this group uses ChatGPT free of charge on individual plans with fewer protections than the enterprise licenses. OpenAI trains its models on these interactions to improve their performance at the expense of users’ anonymity.?

Individual privacy policy

According to the Privacy Policy, OpenAI processes personally identifiable information to improve service delivery, research, and other routine business practices. The most popular LLM continuously learns from conversations, whereas rival foundation models like Anthropic’s Claude and Gemini by Google are not trained on user data. As a result, the maker of ChatGPT has amassed a database that includes account details, contact information, user-generated content, app activity, and browser cookies for more than 375 million people.

OpenAI collates data from publicly available internet sources and third-party content—including Snapchat, Stripe, Spotify, Microsoft Teams, Slack, and Reddit. Despite anonymization efforts, the AI research collective extracted confidential medical records and the personally identifiable information of minors. Furthermore, the policy permits the disclosure of such data to vendors, government agencies, corporate affiliates, or account administrators.

These policies also apply to files uploaded to ChatGPT. Individuals seeking more precise and relevant responses can ground LLMs with context by appending attachments to the chat client and custom GPTs that are instructed to serve their defined purpose. By design, the content will inevitably appear in interactions between bespoke chatbots and their users. Supplemental documents can be exposed to others and downloaded when the code interpreter feature is enabled, as seen in the custom GPT configurator.?

File contents may appear in conversations with others and can be downloaded when Code Interpreter is enabled.

Due to the potentially proprietary or privileged nature of such media, these incidents present additional opportunities for negligent data exposure or exploitative cyberattacks. As the producer of the fastest-growing software product of its time, OpenAI remains an attractive target for black hat hackers. In March 2023, the company was the victim of a security breach due to a vulnerability found in Redis, the open-source library used to cache user data for faster access and recall. The personal information, contact details, payment credentials, and conversation history were exposed for as many as 1.2% of ChatGPT Plus account holders.

Enterprise privacy policy

The Enterprise privacy policy defines guidelines that ensure the safe and secure utilization of LLM for corporate accounts. Unlike free or ChatGPT Plus accounts organizations exercise ownership of data inputs and outputs, which OpenAI vows not to use for model training. Administrators with access to ChatGPT Team, Enterprise, or API may designate data retention periods of up to 30 days unless zero retention requests have been granted or legally mandated.

Business plans adhere to best practices for identity authentication and single sign-on authorization. Not only is data encrypted at rest and in transit, but it also includes controls for permission and access management. In addition, OpenAI follows industry standards by offering Data Processing Addendums, helping customers meet regulations like GDPR, CCPA, and HIPAA while inviting third parties to audit their operations.?

Data removal requests

ChatGPT users concerned about privacy can submit a removal request through OpenAI’s Privacy Request Portal. They can also download their data, opt out of training, or delete their accounts, as shown below. Unfortunately, training data restrictions cannot be retroactively applied, and personal details cannot be deleted in bulk.

Submit a privacy request to OpenAI in the Privacy Request Portal.

Consumers reserve the right to request the removal of personal data under CCPA, but this virtue also presents an enigma. Users are left to wonder whether they can retroactively delete information. As context windows expand, greater volumes of data can be memorized by the model and extracted in many possible permutations. OpenAI does not disclose what content may be deleted in response to a data removal request or how comprehensive they might be.

Personal data misappropriation

Enterprise software companies like Salesforce, Microsoft, and Grammarly have incorporated AI into their products, which are trained on the output produced by working professionals. It’s prudent to determine which AI-enabled apps collect personal information by default and configure privacy settings according to preferences. Nefarious actors with access to such datasets can create bots that emulate an individual’s likeness to commit fraud and plagiarism.?

Slack by Salesforce

The prominent chat communication channel Slack is no different. This Salesforce subsidiary refrains from training third-party LLMs or developing their generative LLMs with customer data. However, Slack analyzes users’ messages, files, and content for its non-generative models without explicit consent.?

The app leverages data points from nearly 40 million daily active users to recommend channels, suggest emojis, autocomplete text, and optimize search results. Slack automatically opts users into non-generative model training. To unsubscribe, they must request approval from their organization’s administrator, who then submits the directive for processing.

LinkedIn by Microsoft

LinkedIn is another app that capitalizes on user-generated content to enhance its proprietary AI models without permission. The Microsoft subsidiary quietly introduced opt-out forms and new privacy settings before declaring to 1 billion people that it trains generative AI models on user data for writing assistance. However, these users can disable Data for Generative AI Improvement within the Data privacy tab in Account settings. Unfortunately, the data previously available for model training is exempt from opt-outs.

Opt out of training LinkedIn’s generative AI models in Data Privacy settings.

Professionals on the largest corporate social networks may request to erase personally identifiable information from storage. While LinkedIn claims to omit user data of EU, EEA, and Swiss residents from AI model training, this policy does not apply to the algorithms that customize and moderate content displayed. Furthermore, one must file the LinkedIn Data Processing Objection Form to deactivate this feature.

Grammarly

30 million people and 70 million professional teams, including 96% of the Fortune 500, use Grammarly to improve their writing. Like Slack and LinkedIn, the company trains its models on the user data of consumers in all the same countries except for the United Kingdom.

To turn off “Product Improvement and Training,” go to the account’s Privacy settings. Regardless of user preferences, Grammarly’s drafting and editing assistant still analyzes user activities – such as the number of words written, the types of suggestions users accept, errors made, and more – to improve the customer experience.

Consent and privacy preferences?

Although ChatGPT, Slack, LinkedIn, and Grammarly are known to violate confidentiality for millions, it’s possible that they are not the only apps that misappropriate personal data. Users expect LLM and AI developers to adopt consensual privacy by default policies due to the absence of legislation governing such standards.?

These compliance measures intend to foster deeper trust with consumers and reduce the risk of data leaks by ensuring users opt into model training and elect privacy preferences. Fortunately, IBM and Meta formed the AI Alliance with 50 founding members to advance open and responsible AI. The group is developing benchmarks, tooling, metrics, and methodologies to ensure the trust and safety of AI systems, among other focus areas.

要查看或添加评论，请登录

Anthony Walsh的更多文章

The Dangers of Data Egress and Ingress for LLM Usage

2025年3月6日

The Dangers of Data Egress and Ingress for LLM Usage

Generative AI is quickly becoming a pervasive utility in the workplace. According to research by Microsoft, 78% of…
Ethical AI Policies and Their Unintended Consequences

2025年1月23日

Ethical AI Policies and Their Unintended Consequences

The imposition of AI policy in global jurisdictions has disrupted product launches for some of the largest…

1 条评论
How to Optimize LLM Performance with AI Agents

2024年6月25日

How to Optimize LLM Performance with AI Agents

Apple popularized the concept of AI agents in 1994. After thirty years, they will soon be integrated into desktops and…

2 条评论
5 Product Marketing Programs that Add Immediate Value

2023年1月14日

5 Product Marketing Programs that Add Immediate Value

Revenue Optimization & Customer Satisfaction The role of a Product Marketer entails the management of programs that add…

2 条评论
10 Dos and Donts of Drafting Positioning Statements

2022年8月18日

10 Dos and Donts of Drafting Positioning Statements

Defining Your Solution Whether your team is preparing for a new product release, corporate rebranding, or iterative…
40 Things To Do For Your Next B2B Software Launch

2022年5月19日

40 Things To Do For Your Next B2B Software Launch

B2B SaaS Product Launch Guide Releasing new software can be a struggle for mature organizations, and even more so for…

2 条评论
30-60-90 Day Plan for B2B Product Marketers

2022年5月6日

30-60-90 Day Plan for B2B Product Marketers

Are you a new Product Marketing recruit, or a Marketing Leader hiring Product Marketers? You're likely tasked with…

See all articles

ChatGPT safeguards and vulnerabilities

Individual privacy policy

Enterprise privacy policy

Data removal requests

Personal data misappropriation

Slack by Salesforce

LinkedIn by Microsoft

Consent and privacy preferences?

Anthony Walsh的更多文章

The Dangers of Data Egress and Ingress for LLM Usage

Ethical AI Policies and Their Unintended Consequences

How to Optimize LLM Performance with AI Agents

5 Product Marketing Programs that Add Immediate Value

10 Dos and Donts of Drafting Positioning Statements

40 Things To Do For Your Next B2B Software Launch

30-60-90 Day Plan for B2B Product Marketers