ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

OpenAI Accuses DeepSeek of Copying Its Technology: Ethical and Legal Implications

Khaled Hasan Prince

CS @AIUB | NLP | Machine Learning | SQL | Blog Writer | Seeking Research Opportunities |

å‘å¸ƒæ—¥æœŸ: 2025å¹´2æœˆ5æ—¥

+ å…³æ³¨

OpenAI has accused DeepSeek of improperly using its data to train its open-source reasoning model, DeepSeek R-1.

In the past couple of days, thereâ€™s been controversy surrounding DeepSeek, a Chinese AI startup, and its alleged use of OpenAIâ€™s proprietary models.

The issue surfaced after DeepSeek launched two new models, DeepSeek-V3 and DeepSeek-R1. These models perform just as well as OpenAIâ€™s but come at much lower prices.

OpenAI has accused DeepSeek of improperly using its data to train these models, which has set off a heated debate about intellectual property rights in the AI world and the ethics behind model distillation.

Model distillation, also known as knowledge distillation, is a method in machine learning where knowledge from a large, complex model (the "teacher") is transferred to a smaller, more efficient model (the "student").

A distilled model is basically a smaller model that performs similarly to the larger one but requires fewer computational resources.

If youâ€™re interested in knowing how an OpenAI model is distilled, check out this?documentation.

What Exactly Was Copied?

In the fall of 2024, Microsoftâ€™s security researchers observed a group believed to be connected to DeepSeek extracting large amounts of data from OpenAIâ€™s API.

This activity raised concerns that DeepSeek was using distillation to replicate OpenAIâ€™s models without authorization. The excessive data retrieval was seen as a violation of OpenAIâ€™s terms and conditions, which restrict the use of its API for developing competing models.

According to Mark Chen, Chief Research Officer at OpenAI, DeepSeek managed to independently find some of the core ideas OpenAI had used to build its o1 reasoning model.

Chen noted that the reaction to DeepSeek, which caused NVIDIA to lose $650 billion in market value in a single day, might have been overblown.

However, I think the external response has been somewhat overblown, especially in narratives around cost. One implication of having two paradigms (pre-training and reasoning) is that we can optimize for a capability over two axes instead of one, which leads to lower costs.?â€”?Mark Chen

While OpenAI hasn't revealed all the details, it has confirmed that there's substantial evidence suggesting DeepSeek used distillation techniques to train its models.

In response, OpenAI and Microsoft have blocked access to OpenAIâ€™s API for accounts suspected to be linked to DeepSeek. This action is part of a larger initiative by U.S. AI companies to protect their intellectual property and prevent unauthorized use of their models.

The situation has also raised national security concerns, prompting the White House to review the implications of such practices on the U.S. AI industry.

Model Distillation is Legal

Model distillation itself is not inherently illegal. It is a widely used technique in the AI industry to create more efficient models by transferring knowledge from a larger model to a smaller one.

Take the?Stanford Alpaca?model as an example. Alpaca is a language model fine-tuned using supervised learning from a LLaMA 7B model on 52K instruction-following demonstrations generated from OpenAIâ€™s text-davinci-003.

The data generation process results in 52K unique instructions and the corresponding outputs, which cost less than $500 using the OpenAI API.

It demonstrates how distillation can be used to create smaller, more affordable models that still perform well.

é¢†è‹±æŽ¨è

Musk vs OpenAI: Another Legal Battle!

AI NOW & BEYOND 7 ä¸ªæœˆå‰

Did DeepSeek Copy OpenAI? Examining AI Model Stealing, Distillation, and Intellectual Property Rights

Did DeepSeek Copy OpenAI? Examining AI Model Stealing,â€¦

Anand Ramachandran 1 ä¸ªæœˆå‰

No One Leaves a Good Company: An OpenAI Story

? Khayyam Wakil ?? YVR 3 ä¸ªæœˆå‰

In fact, if you read DeepSeekâ€™s?whitepaper, DeepSeek R-1 is a distilled model from Qwen (Qwen, 2024b) and Llama (AI@Meta, 2024).

To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

Based on DeepSeekâ€™s findings, it appears that this straightforward distillation method significantly enhances the reasoning abilities of smaller models.

The controversy stems from allegations that DeepSeek used OpenAIâ€™s model outputs to fine-tune their own models, which may be against OpenAIâ€™s terms of service. This raises questions about fair use, data ownership, and the competitive landscape in the AI industry.

Running DeepSeek API Requires OpenAI Libraries

To use DeepSeekâ€™s API, you need to run?â€˜npm install OpenAI.â€™

Yep, you read that right. DeepSeek works with OpenAIâ€™s client libraries! This is possible because DeepSeekâ€™s REST API is fully compatible with OpenAIâ€™s API.

?Quite an interesting turn of events in the AI world!

DeepSeek avoided spending weeks building Node.js and Python client libraries by reusing OpenAIâ€™s code.
Developers using OpenAI can easily try or switch to DeepSeek by just changing the base URL and API key.
If DeepSeek ever needs to make changes, they can simply fork the library and replace OpenAI with DeepSeek.

As a developer, this is a good thing, and I donâ€™t see it as a huge problem because this is a common practice for LLM providers and aggregators. OpenRouter, Ollama, DeepInfra, and a bunch of others do this too.

In terms of API access, DeepSeek claims that you can utilize the R1 API at a significantly lower cost compared to OpenAIâ€™s offerings.

????? $0.14 per million input tokens (cache hit)
?????? $0.55 per million input tokens (cache miss)
?????? $2.19 per million output tokens

The cost for output tokens is nearly 30 times lower compared to the $60 per million output tokens for O1. This represents a significant reduction in expenses for companies managing extensive AI operations.

Take a look at this visual comparison of DeepSeekâ€™s R1 and OpenAIâ€™s models.

Switching to the R-1 API would mean huge savings. You can learn more about DeepSeekâ€™s API access here.

Final Remarks

DeepSeek was barely known outside research circles until last month when it launched its v3 model. Since then, it has caused AI stocks to drop and even been called a â€œcompetitorâ€ by OpenAIâ€™s CEO. Itâ€™s unclear how things will play out for DeepSeek in the coming months, but it has caught the attention of both the public and major AI labs.

Ironically, it feels weird that OpenAI is accusing DeepSeek of IP theft, given their history of copyright infringement. OpenAI gathered massive amounts of data from the internet to train its models, including copyrighted material, without seeking permission. This practice has resulted in lawsuits from notable figures such as George R.R. Martin and Elon Musk (regarding Twitter data).

OpenAI may become even more closed off as a result. Do you remember the incident when Musk shut down free API access to X (formerly Twitter) due to data being stolen? Although thereâ€™s a thin chance that OpenAI will do the same, itâ€™s not unlikely to happen.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Khaled Hasan Princeçš„æ›´å¤šæ–‡ç«

What is Data Science: Lifecycle, Applications condition and Tools. Intro To Data Science

2023å¹´8æœˆ4æ—¥

What is Data Science: Lifecycle, Applications condition and Tools. Intro To Data Science

What Is Data Science? Data science is the domain of study that deals with vast volumes of data using modern tools andâ€¦

OpenAI Accuses DeepSeek of Copying Its Technology: Ethical and Legal Implications

Khaled Hasan Prince

CS @AIUB | NLP | Machine Learning | SQL | Blog Writer | Seeking Research Opportunities |

é¢†è‹±æŽ¨è

Khaled Hasan Princeçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

?? Build an LLM app

OpenAI's Commitment to Ethical AI: A Closer Look

Elon sues OpenAI, a new startup by Co-founder of ROSS Intelligence, chat with AI offline on your PC and more

Elon Musk Sues OpenAI: A Deep Dive into the Allegations and Implications

Microsoft and OpenAI Investigate Alleged AI Breach Linked to Chinese Startup DeepSeek

Regulating AI: OpenAI faces GDPR challenges, experts argue for GPAI regulation, and new players enter the generative AI race

Behind Closed Doors: The Decision Not to Release Training Data for GPT-4

OpenAI: Threatening to Ban Users for Asking "Strawberry" About Its Reasoning

Should we use DeepSeek instead of OpenAI?

OpenAI Gears Up for Mid-Year Launch of GPT-5: Report

é¢†è‹±æŽ¨è

Khaled Hasan Princeçš„æ›´å¤šæ–‡ç«

What is Data Science: Lifecycle, Applications condition and Tools. Intro To Data Science

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

?? Build an LLM app

OpenAI's Commitment to Ethical AI: A Closer Look

Elon sues OpenAI, a new startup by Co-founder of ROSS Intelligence, chat with AI offline on your PC and more

Elon Musk Sues OpenAI: A Deep Dive into the Allegations and Implications

Microsoft and OpenAI Investigate Alleged AI Breach Linked to Chinese Startup DeepSeek

Regulating AI: OpenAI faces GDPR challenges, experts argue for GPAI regulation, and new players enter the generative AI race

Behind Closed Doors: The Decision Not to Release Training Data for GPT-4

OpenAI: Threatening to Ban Users for Asking "Strawberry" About Its Reasoning

Should we use DeepSeek instead of OpenAI?

OpenAI Gears Up for Mid-Year Launch of GPT-5: Report

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†