As I have read a lot of misleading information online on OpenAI’s usage of data in the context of Samsung’s alleged data leaks, I have gathered all important key facts on data usage from OpenAI’s policies and guidelines in an article for you. I do consult companies on digital transformation matters and teach digital literacy, which is why educating people - be it individuals, companies or employees - on digital matters is close to my heart.
In order to shed some light how OpenAI uses your data and if it uses it to improve their services, you need to first understand the two ways of how services by OpenAI can be used: either (A) through consumer products such as ChatGPT and DALL-E or (B) through an API.
Option (A) is directly available for everyone without coding needs. You can kick off usage immediately by signing in for ChatGPT or DALL-E. Most people will probably use OpenAI services that way.
For option (B) you need some form of coding and can submit your own training data to fine-tune models. OpenAI then processes user prompts & completions through the API.
Different data usage policies apply to these two options.
OpenAI’s Data Usage in Consumer Services
- In Consumer Services OpenAI may use your content such as prompts, responses, uploaded images, and generated images to improve their services. They remove?any personally identifiable information from data they want to use to improve the models. Furthermore OpenAI says that they only use “a a small sampling of data per customer”. SOURCE
- You can, however, request to opt out of having your content used to improve their services by filling out a form with your organization ID and email address associated with the owner of the account. This opt out will only apply on a going-forward basis, not retrospective.
- Authorized personnel from OpenAI may view and access your content…
- to investigate abuse or a security incident
- to provide support if you reach out to OpenAI
- to comply with legal obligations
- If you have not opted out: to improve the models and services. They use de-idenfied content here.
- You can request to delete your account which will be done within 30 days of your request. Once you delete your account, you may not re-sign up for an account with the same email address. OpenAI does not support partial data deletion. You have to delete your full account if you want to delete something from your chat history.
- OpenAI does not sell your data or share it with third parties for marketing purposes.
OpenAI’s API Data Usage Policy
- By default, data submitted by customers after March 1, 2023 via API will not be used to train or improve the models. This has changed in March 2023. Data submitted to the API prior to March 1, 2023 may have been used for improvements if the customer had not previously opted out of sharing data. Data submitted for fine-tuning will only be used to fine-tune the customer’s model after March 1, 2023.
- In order to answer the question how long your data is stored, we need to distinguish here between the data that you upload via the Files endpoint, for instance to fine-tune a model, and the data that includes your usage such as prompts and completions.
- Data submitted by the user through the Files endpoint is retained until the user deletes the file.
- API content (your prompts, completions, etc.) is retained for 30 days for abuse and misuse monitoring purposes. A limited number of authorized OpenAI employees and specialized third-party contractors that are subject to confidentiality and security obligations, can access this data to investigate and verify suspected abuse. Additionally, OpenAI uses automated content classifiers that flag when data is suspected to contain platform abuse.
- All data is processed and stored in the US.
- API content is stored on OpenAI systems and sub-porocessors’s systems. OpenAI also sends selected portions of de-identified content to third-party contractors (subject to confidentiality and security obligations).
OpenAI’s Usage of Personal Information
- OpenAI gatheres a lot of personal information:
- The info that you voluntarily provide: account information (name, credentials, payment card, transaction history), your personal information that is included in the content form the usage of the service, communication with OpenAI, social media information. By the way: “You must provide accurate and complete information to register for an account” (SOURCE ).
- Information from your use of the service: log data (IP address, browser type, settings, date, time), usage data (types of content you view and engage with, the actions you take, features you use, time zone, country, type of computer or mobile device, etc.), device information (name of device, operating system). OpenAI uses cookies.
- Your personal information can be used to: analyze and improve the services, to conduct research, for communication purposes, to develop new programs and services, to prevent misuse, to comply with legal obligations (full list: SOURCE)
- OpenAI may aggregate or de-identify information for research, analysis and optimization purposes.They only use the personal data in an anonymous or de-identified form and have committed themselves not to attempt to reidentify the information. SOURCE
- They also may analyze general behavior and characteristics of users.
- They may share aggregated information like general user statistics with third parties, publish such aggregated information or make such aggregated information generally available.
Other important facts around DATA from the Terms of Use
- “As between the parties and to the extent permitted by applicable law, you own all Input” that you provide to the services, such as what you type in as the prompt and its context.
- With regards to the output that is generated by the service, OpenAI assigns you all its rights, which means that you can use it for any purpose including commercial purposes.
- You agree with the terms of use that neither OpenAI nor any of their affiliates or licencors will be liable for any damages for loss of profits or data losses.
- You may get access to confidential information of OpenAI. “Confidential Information means nonpublic information that OpenAI or its affiliates or third parties designate as confidential or should reasonably be considered confidential under the circumstances, including software, specifications, and other nonpublic business information. Confidential Information does not include information that: (i) is or becomes generally available to the public through no fault of yours; (ii) you already possess without any confidentiality obligations when you received it under these Terms; (iii) is rightfully disclosed to you by a third party without any confidentiality obligations; or (iv) you independently developed without using Confidential Information.” ?You may not disclose this information to any third party.
- If you use the OpenAI Services to process personal data, you must provide legally adequate privacy notices and obtain necessary consents for the processing of such data, and you represent to OpenAI that you are processing such data in accordance with applicable law.
I hope that has given you some clarity on how OpenAI uses your data. Whereas the data in consumer services can be used to improve their offerings, they will not use the data coming through an API since March 1, 2023. I will write further articles on the <so what> for both individuals and companies. But I think that is enough input for one article :D Make sure to follow if you don’t want to miss it. If you have any feedback or additions, please let me know. Also feel free to reach out, if I can help.
Data Analyst & Webdevelopment
1 年Thanks Katharina Kulawinski. Still in the unknown is the answer to the question who is the legal owner of the answer to an api call generated on top of your discretionary data input. Could you share some thoughts about that?
CEO @ SANTACATERINA | Global Strategic Leadership & Board Executive Advisory
1 年Chris Leong, FHCA
CEO @ SANTACATERINA | Global Strategic Leadership & Board Executive Advisory
1 年Thank you Katharina Kulawinski. Policy does not state how long personal credentials will be retained. It contradicts GDPR under ‘data minimisation’ principle. Many US companies take cover under “legitimate interest” to use personal data as they wish because they consider engagement with the platform as informed consent. However that is not correct; it seems there is no “off-ramp” or no alternative way to explore the product other than by entering personal credentials. If a person wishes to learn more how else would they go about making an informed decision before using the platform? Is there any provision for Data Subect Access Request? For example, in the 8 Rights there is the Right to anonymity, and the Right to be forgotten; meaning you can request “destruction” of data (not just deletion from the algorithm). However the commodification of personal data is lucrative business at present. If they sell personal credentials to third parties or personal profile is a part of the bot’s hallucinations it could destroy your personal identity and authenticity. Where are the operational safeguards to protect individuals and companies?