Ex-OpenAI staffer claims the ChatGPT maker leverages “the fair use doctrine” to violate copyright law and destroy the internet — after Sam Altman admitted it’s impossible to develop AI tools without copyrighted material

What you need to know

A former OpenAI employee recently published a blog post highlighting the firm’s transgressions, including breaking copyright law by using internet data to train ChatGPT.
The report suggests OpenAI relies on technicalities in copyright law to continue using copyrighted content and internet data to train AI models without authorization or compensation.
He also highlighted AI-generated content’s role in ruining the internet, including inaccurate information.

As a tech-savvy individual who has spent years navigating the digital landscape, I find myself deeply concerned about the recent revelations surrounding OpenAI and their practices. Having worked alongside some of the brightest minds in the industry, I can’t help but feel disheartened when I see a company that once held such promise straying from its original mission.

In the midst of rumors about bankruptcy and transitions towards becoming a money-making company, prominent individuals are choosing to leave OpenAI. Most recently, Suchir Balaji decided to focus on personal endeavors outside of OpenAI.

After completing his studies at UC Berkeley, Balaji became a member of the organization behind ChatGPT, aspiring to contribute to a team using advanced AI technology for disease treatment and even reversing aging. Primarily focusing on the development of OpenAI’s GPT-4 model, he found himself working with a system that Sam Altman confessed was only “modestly satisfactory” at best, with some moments when it fell short, as Altman put it, “not great.

Yet, the individual aged 25 years decided to part ways with the artificial intelligence company when he discovered that his personal objectives were not in line with theirs. In an interview with The New York Times, Balaji clarified this point.

By using data generated by individuals, businesses, and online platforms, AI companies may be undermining the financial success of those entities whose data is employed in training these artificial intelligence systems.

He blatantly claimed OpenAI breaks the U.S. copyright law, a serious allegation coming from someone who’s worked at the company. This isn’t the first time OpenAI has been under fire for copyright infringement issues. The ChatGPT maker is fighting several copyright infringement lawsuits in court alongside Microsoft.

Previously, the CEO of OpenAI, Sam Altman, acknowledged that creating tools such as ChatGPT would be nearly impossible without utilizing copyrighted material. However, he also pointed out that current copyright laws do not unequivocally forbid the use of copyrighted content in training AI models.

Is AI model training using copyrighted content fair use?

In Balaji’s blog post, he pointed out concerns about OpenAI potentially violating copyright rules. By examining ChatGPT’s output, Balaji – who used to work at OpenAI – argued that it doesn’t qualify as “fair use” of copyrighted material. To clarify, “fair use” refers to a set of guidelines allowing for limited usage of copyrighted works without the creator’s explicit permission.

Following Balaji’s copyright infringement claims, OpenAI issued the following statement to Gizmodo:

Microsoft and OpenAI contend that they use publicly accessible data for their AI models, abiding by fair use guidelines and principles with roots in longstanding legal precedents. They believe this approach is equitable towards creators, essential for innovation, and vital for maintaining US competitiveness. However, Balaji posits a contrasting viewpoint. While he acknowledges that the output from AI systems isn’t literally copied from sources, it still bears resemblance to copyrighted material, making it potentially illegal under existing copyright laws if it’s considered derivative work.

Beyond his copyright issues, Balaji expressed apprehensions about how AI tools such as ChatGPT might influence the internet’s landscape. A previous Google Engineer voiced alarm that OpenAI’s provisional search tool, SearchGPT, could potentially challenge Google’s dominance in the near future, given antitrust regulations following its classification as an illegal monopoly in search. He also emphasized that AI systems can sometimes produce incorrect and misleading information. “If you share my views,” Balaji stated, “you might need to consider leaving the company.

2024-10-24 22:09

What you need to know

Is AI model training using copyrighted content fair use?

Read More