Sam Altman indicated it’s impossible to create ChatGPT without copyrighted material, but a new study claims 57% of the content on the internet is AI-generated and is subtly killing quality search results

Sam Altman indicated it's impossible to create ChatGPT without copyrighted material, but a new study claims 57% of the content on the internet is AI-generated and is subtly killing quality search results

What you need to know

  • A new study suggests that more than 57% of the content available on the internet is generated content.
  • AI tools like Copilot and ChatGPT depend on information from the internet for training, but the infiltration of AI-generated content into the internet limits their scope, leading to inaccurate responses and misinformation.
  • If copyright law prohibits training AI models using copyrighted content, the responses generated using chatbots will likely worsen and become more inaccurate.

As a seasoned researcher with years of experience in the field of artificial intelligence, I find myself increasingly alarmed by the growing reliance on AI-generated content and its impact on the quality and accuracy of information available online.


As more and more people embrace generative AI, it’s growing harder to distinguish reality from artificiality. Today, AI technology is so advanced that it can convincingly produce realistic images, videos, and text, often leaving people questioning what’s genuine and what’s been created by these tools.

There has been an ongoing conflict between publishers and the creators of AI technology regarding issues related to copyright infringement. Despite the fact that OpenAI CEO Sam Altman acknowledges that tools such as ChatGPT cannot be developed without using copyrighted material, current copyright law allows for the usage of this content to train artificial intelligence models.

According to a recent study in Nature, as reported by Forbes, it’s estimated that nearly six out of ten pieces of content available online are automatically generated by artificial intelligence. Scholars from Cambridge and Oxford warn that this rising trend of AI-generated content and the excessive dependence on the same content could lead to a common outcome: subpar answers to inquiries.

As per the research findings, the AI’s answers to questions started losing their usefulness and precision with each try, according to Dr. Ilia Shumailov from the University of Oxford.

“It’s quite remarkable how swiftly a model may start to falter and become hard to detect. Initially, it tends to impact under-represented data. Later, this leads to a lack of output diversity and a decrease in the range of results. At times, you might notice slight enhancements for the majority data, but these can mask the decline in performance on under-represented data. Model collapse can lead to severe repercussions.”

Based on the findings of researchers, it seems that chatbots can sometimes provide poor quality responses due to an overabundance of artificial intelligence (AI) generated content during their training. Here’s why: AI models learn from data found online. If this online data is itself produced by AI and contains inaccuracies, then the learning process becomes ineffective, resulting in incorrect answers and potentially misleading information being generated.

AI chatbots are lying to themselves

Sam Altman indicated it's impossible to create ChatGPT without copyrighted material, but a new study claims 57% of the content on the internet is AI-generated and is subtly killing quality search results

The researchers chose to delve further into the matter to discover the underlying problem. Initially, they traced it back to a surge of AI-created articles being published without verification online. They employed an AI-driven wiki for their investigations, which they fine-tuned using its results. Quickly, they observed a drop in the accuracy of the information produced by the tool.

The research shows that even though the AI system was initially taught a vast range of data about various dog breeds, it consistently overlooked uncommon breeds in its knowledge base after multiple datasets.

As AI becomes more widespread and AI-created content gets published online, it’s expected that the standard of search engine results may decline.

Read More

2024-09-03 17:22