How AI Apps Use and Misuse Your Data

How AI Apps Use and Misuse Your Data

Although large language model (LLM) and AI application developers have implemented privacy and security protocols—including limited data retention periods, user authentication, data encryption, and removal requests—OpenAI and other enterprise software providers process sensitive information that can be shared with third parties per privacy policies. Data leaks caused by LLMs and AI apps pose significant risks to consumers and businesses.

ChatGPT safeguards and vulnerabilities

OpenAI has lost market share to new and incumbent contenders in recent years, but ChatGPT remains the most popular LLM, boasting over 400 million weekly active users. About 94-95% of this group uses ChatGPT free of charge on individual plans with fewer protections than the enterprise licenses. OpenAI trains its models on these interactions to improve their performance at the expense of users’ anonymity.?

Individual privacy policy

According to the Privacy Policy, OpenAI processes personally identifiable information to improve service delivery, research, and other routine business practices. The most popular LLM continuously learns from conversations, whereas rival foundation models like Anthropic’s Claude and Gemini by Google are not trained on user data. As a result, the maker of ChatGPT has amassed a database that includes account details, contact information, user-generated content, app activity, and browser cookies for more than 375 million people.

OpenAI collates data from publicly available internet sources and third-party content—including Snapchat, Stripe, Spotify, Microsoft Teams, Slack, and Reddit. Despite anonymization efforts, the AI research collective extracted confidential medical records and the personally identifiable information of minors. Furthermore, the policy permits the disclosure of such data to vendors, government agencies, corporate affiliates, or account administrators.

These policies also apply to files uploaded to ChatGPT. Individuals seeking more precise and relevant responses can ground LLMs with context by appending attachments to the chat client and custom GPTs that are instructed to serve their defined purpose. By design, the content will inevitably appear in interactions between bespoke chatbots and their users. Supplemental documents can be exposed to others and downloaded when the code interpreter feature is enabled, as seen in the custom GPT configurator.?

File contents may appear in conversations with others and can be downloaded when Code Interpreter is enabled.

Due to the potentially proprietary or privileged nature of such media, these incidents present additional opportunities for negligent data exposure or exploitative cyberattacks. As the producer of the fastest-growing software product of its time, OpenAI remains an attractive target for black hat hackers. In March 2023, the company was the victim of a security breach due to a vulnerability found in Redis, the open-source library used to cache user data for faster access and recall. The personal information, contact details, payment credentials, and conversation history were exposed for as many as 1.2% of ChatGPT Plus account holders.

Enterprise privacy policy

The Enterprise privacy policy defines guidelines that ensure the safe and secure utilization of LLM for corporate accounts. Unlike free or ChatGPT Plus accounts organizations exercise ownership of data inputs and outputs, which OpenAI vows not to use for model training. Administrators with access to ChatGPT Team, Enterprise, or API may designate data retention periods of up to 30 days unless zero retention requests have been granted or legally mandated.

Business plans adhere to best practices for identity authentication and single sign-on authorization. Not only is data encrypted at rest and in transit, but it also includes controls for permission and access management. In addition, OpenAI follows industry standards by offering Data Processing Addendums, helping customers meet regulations like GDPR, CCPA, and HIPAA while inviting third parties to audit their operations.?

Data removal requests

ChatGPT users concerned about privacy can submit a removal request through OpenAI’s Privacy Request Portal. They can also download their data, opt out of training, or delete their accounts, as shown below. Unfortunately, training data restrictions cannot be retroactively applied, and personal details cannot be deleted in bulk.

Submit a privacy request to OpenAI in the Privacy Request Portal.

Consumers reserve the right to request the removal of personal data under CCPA, but this virtue also presents an enigma. Users are left to wonder whether they can retroactively delete information. As context windows expand, greater volumes of data can be memorized by the model and extracted in many possible permutations. OpenAI does not disclose what content may be deleted in response to a data removal request or how comprehensive they might be.

Personal data misappropriation

Enterprise software companies like Salesforce, Microsoft, and Grammarly have incorporated AI into their products, which are trained on the output produced by working professionals. It’s prudent to determine which AI-enabled apps collect personal information by default and configure privacy settings according to preferences. Nefarious actors with access to such datasets can create bots that emulate an individual’s likeness to commit fraud and plagiarism.?

Slack by Salesforce

The prominent chat communication channel Slack is no different. This Salesforce subsidiary refrains from training third-party LLMs or developing their generative LLMs with customer data. However, Slack analyzes users’ messages, files, and content for its non-generative models without explicit consent.?

The app leverages data points from nearly 40 million daily active users to recommend channels, suggest emojis, autocomplete text, and optimize search results. Slack automatically opts users into non-generative model training. To unsubscribe, they must request approval from their organization’s administrator, who then submits the directive for processing.

LinkedIn by Microsoft

LinkedIn is another app that capitalizes on user-generated content to enhance its proprietary AI models without permission. The Microsoft subsidiary quietly introduced opt-out forms and new privacy settings before declaring to 1 billion people that it trains generative AI models on user data for writing assistance. However, these users can disable Data for Generative AI Improvement within the Data privacy tab in Account settings. Unfortunately, the data previously available for model training is exempt from opt-outs.

Opt out of training LinkedIn’s generative AI models in Data Privacy settings.

Professionals on the largest corporate social networks may request to erase personally identifiable information from storage. While LinkedIn claims to omit user data of EU, EEA, and Swiss residents from AI model training, this policy does not apply to the algorithms that customize and moderate content displayed. Furthermore, one must file the LinkedIn Data Processing Objection Form to deactivate this feature.

Grammarly

30 million people and 70 million professional teams, including 96% of the Fortune 500, use Grammarly to improve their writing. Like Slack and LinkedIn, the company trains its models on the user data of consumers in all the same countries except for the United Kingdom.

Opt out of Grammarly’s Product Improvement and Training in Privacy settings.

To turn off “Product Improvement and Training,” go to the account’s Privacy settings. Regardless of user preferences, Grammarly’s drafting and editing assistant still analyzes user activities – such as the number of words written, the types of suggestions users accept, errors made, and more – to improve the customer experience.

Consent and privacy preferences?

Although ChatGPT, Slack, LinkedIn, and Grammarly are known to violate confidentiality for millions, it’s possible that they are not the only apps that misappropriate personal data. Users expect LLM and AI developers to adopt consensual privacy by default policies due to the absence of legislation governing such standards.?

These compliance measures intend to foster deeper trust with consumers and reduce the risk of data leaks by ensuring users opt into model training and elect privacy preferences. Fortunately, IBM and Meta formed the AI Alliance with 50 founding members to advance open and responsible AI. The group is developing benchmarks, tooling, metrics, and methodologies to ensure the trust and safety of AI systems, among other focus areas.

要查看或添加评论,请登录

Anthony Walsh的更多文章