Is DeepSeek’s AI Just a Cheap Imitation of ChatGPT? Shocking Similarity Revealed!

A Chinese AI company called DeepSeek made a big impact in the artificial intelligence world this year with its highly affordable R1 V3-powered AI model. This development has sparked worries among investors, particularly as it outperformed OpenAI’s o1 reasoning model across multiple benchmarks such as mathematics, science, and programming, all at a significantly lower price point.

According to DeepSeek researchers, around six million dollars was invested in training their cost-effective model. However, various sources allege that the company saved costs by potentially utilizing copyrighted material from Microsoft and OpenAI during the training process.

A different source suggested that the Chinese AI company might have invested around $1.6 billion in hardware, such as 50,000 NVIDIA Hopper GPUs. OpenAI raised concerns, hinting at possible usage of these resources for developing an affordable AI model they utilize to train their systems.

DeepSeek’s creators asserted that they employed the technique of “knowledge transfer” when training their R1 model using outputs from an existing model (in this case, OpenAI’s). In simpler terms, what they did was take advantage of the knowledge produced by an already-trained model to help create a new one.

In essence, the company is cutting down on the substantial financial investment needed for creating and educating an AI model. And it appears that OpenAI’s claims may indeed be valid.

According to a recent analysis conducted by AI plagiarism detector Copyleaks, the AI-created content from DeepSeek shares striking similarities with OpenAI’s ChatGPT. What might be even more troubling is that the study’s results showed an astonishing 74.2% likeness (as reported by Forbes).

Did DeepSeek train its AI model using OpenAI’s copyrighted content? The tell-tale signs suggest as much

In this particular analysis, Copyleaks’ technology and classifier algorithms concluded with consensus that DeepSeek’s results were produced utilizing OpenAI’s model systems.

It’s worth noting that the AI detection company employs a particular method to single out text produced by various AI models, such as OpenAI, Claude, Gemini, Llama, and more. What makes this interesting is that it manages to distinguish each model’s output uniquely. To minimize false positives, these classifiers typically rely on a system of unanimous voting.

Shai Nisan, head of data science at Copyleaks indicated:

In our study, we employed the ‘unanimous jury’ method and discovered a striking resemblance in the style of DeepSeek and OpenAI’s models that was not present in the other models we examined.

As a passionate follower, I find myself questioning the efficiency of DeepSeek’s AI model development and training methods, as recent studies have stirred doubts among investors regarding the substantial investments made in this area.

As highlighted by Nissan:

While the resemblance between DeepSeek and OpenAI isn’t conclusive evidence of DeepSeek being a derivative, it does spark questions about its origins. Our study primarily concentrates on analyzing writing styles; within this field, the similarity to OpenAI is quite substantial. Given OpenAI’s dominance in the market, our findings hint that a comprehensive examination of DeepSeek’s structure, training materials, and development history would be beneficial.

What’s next for DeepSeek if found guilty of copyright infringement?

The study indicates that DeepSeek’s AI-generated texts share similarities with ChatGPT from OpenAI by 74.2%. Yet, this doesn’t automatically mean that DeepSeek’s model is an exact replica. On the contrary, such findings might lead to complications for the AI startup, as it could face challenges related to intellectual property rights and potential copyright violations.

Furthermore, since DeepSeek failed to explicitly confirm that it utilized OpenAI’s models for training its system, this ambiguity could potentially lead to complex legal issues and potential financial losses.

According to Copyleaks’ Head of Data Science:

As a researcher, I firmly believe that clarity and robust intellectual property (IP) safeguards will play a crucial role in shaping the trajectory of AI development and governance. It’s highly plausible that regulatory bodies may insist on companies openly sharing comprehensive details about the datasets and results generated by their AI models during training processes.

OpenAI has multiple copyright infringement ghosts in its basement

It’s common knowledge that OpenAI and Microsoft have had their share of legal encounters, particularly concerning copyright infringement cases stemming from their artificial intelligence projects. For example, eight news agencies sued Microsoft and OpenAI for copyright violations as recently as May 2024.

The CEO of OpenAI, Sam Altman, stated that current copyright laws don’t inherently ban the employment of copyrighted material in teaching artificial intelligence systems. Nevertheless, he conceded that it would be extremely difficult, if not impossible, to create models similar to ChatGPT without utilizing copyrighted content.

Due to the fast growth of AI technology, it’s becoming challenging to clearly define where copyright infringement occurs as AI tools seem to operate in a gray zone. This makes it difficult for us to determine precisely when these AI companies are taking content from publishers or other online resources without permission.

Read More

2025-03-06 01:20