Meta's Desperate Moves: How They Tried to Steal OpenAI's Thunder!

once more, Meta’s struggles with artificial intelligence have made headlines. Microsoft CEO Satya Nadella acknowledged that OpenAI had a two-year advantage in the AI race without much competition, leading to the development of ChatGPT. Meanwhile, other prominent AI labs like Anthropic and Google are quickly catching up. In contrast, it appears Meta is putting in overtime trying to stay afloat in this fast-paced competition.

In the heat of a significant copyright dispute, it’s been suggested that Meta Inc., as reported by The Verge, may have employed copyrighted material to educate their artificial intelligence systems. Additionally, there are indications that they might have made efforts to hide this activity in order to sidestep issues related to copyright infringement.

It’s intriguing to note that the company was employing underhanded methods to quickly match OpenAI’s fast advancements in the AI sector. An email from the company’s top AI executive to Meta AI researcher Hugo Touvron indicated their eagerness to develop something akin to GPT4, implying a focus on mastering cutting-edge technology and leading the race.

Instead, it has been reported that the details regarding how the creator of Facebook intends to reach these objectives allegedly include the use of the digital library site Library Genesis (LibGen) for training their models.

A recent exposé by The Verge uncovered an email exchange between Sony Theakanath, Meta’s Director of Product, and Joelle Pineau, VP of AI Research, where they discussed whether to utilize data from the Library Genesis (LibGen) website for internal purposes. Specifically, they were considering using LibGen’s data for benchmarks in a blog post or to train a model. Theakanath mentioned that Gen AI had been given permission to use LibGen for Llama3 under certain conditions, such as discarding any data flagged as pirated or stolen without acknowledging that the model was educated using LibGen’s data.

As Theakanath states, “Libgen plays a crucial role in achieving cutting-edge performance.” He also mentioned that it has been learned through rumors within the industry that OpenAI and Mistral are utilizing this library for their models, and he brought this matter to the attention of an executive at the organization, which is presumed to be Meta CEO Mark Zuckerberg.

The email further emphasized possible policy threats that might arise from training AI models using copyrighted materials, such as regulatory actions and interventions triggered by media attention, which could potentially expose Meta’s copyright violations. As pointed out by Theakanath, this could potentially weaken our bargaining power with regulators regarding these matters.

According to reports, Meta allegedly employed clever tactics to conceal its actions after utilizing data from LibGen to educate its AI models. These tactics included deleting copyright notices and document identifiers like the copyright symbol, as well as comments from employees to make things more confusing. Furthermore, they erased metadata with the intention of avoiding potential legal issues.

Copyright infringement is seemingly crucial for AI model training

Microsoft and OpenAI have found themselves embroiled in numerous copyright infringement legal disputes. Although some of these cases are ongoing, Sam Altman, CEO of OpenAI, acknowledged that creating AI models without utilizing copyrighted material is extremely challenging. He went on to explain that the vast majority of internet content is protected by copyright, and using such content to educate AI systems is generally considered acceptable under the concept of “fair use.” However, he emphasized that the law does not explicitly outlaw the training of AI models with copyrighted material.

Lately, it seems that leading AI research facilities like OpenAI and Anthropic are encountering difficulties in building sophisticated AI systems, allegedly because of a scarcity of premium content. Yet, influential figures in the AI community, such as Sam Altman and a former Google CEO, have challenged these reports, arguing that there’s no substantial proof to suggest scaling laws have been breached; they claim “there is no barrier.

2025-01-17 01:39

Meta’s Desperate Moves: How They Tried to Steal OpenAI’s Thunder!

Copyright infringement is seemingly crucial for AI model training

Read More