Was Sora AI trained using YouTube and gaming content? OpenAI might need a minute to check with the team.

Was Sora AI trained using YouTube and gaming content? OpenAI might need a minute to check with the team.

As a tech enthusiast with over two decades of experience under my belt, I’ve seen AI evolve from a distant dream to a ubiquitous reality. The recent unveiling of OpenAI’s Sora AI model has certainly piqued my interest, but it also leaves me scratching my head in confusion.

OpenAI has made its text-to-video AI model Sora available to all users as part of their 12 days of holiday promotion. This model was previously released in a preview version back in February. The company, known for creating ChatGPT, mentioned that this tool is only accessible to ChatGPT Pro and Plus subscribers at the moment, and there’s no current plan to offer it to free users.

Although the tool showcases remarkable capabilities that stand out among others, the AI company underscored significant performance problems in its video creation, particularly challenges in generating realistic physics for intricate actions over extended periods. It’s worth noting that this occurs even with the support of OpenAI’s robust and advanced Sora Turbo AI model.

It appears the creators of ChatGPT haven’t spoken publicly about where their model was trained. But according to an article by TechCrunch, there are suspicions that Sora might have been trained on game-related content. When Sora was introduced in February, it seemed clear that the AI model was based on Minecraft video footage.

It appears that Minecraft isn’t the sole video game in Sora AI’s training library. In addition to Minecraft, games like Super Mario Bros, Call of Duty, Counter-Strike, and a ’90s version of Teenage Mutant Ninja Turtles are also included. OpenAI has released several videos showcasing Sora AI’s generated clips that closely resemble the listed video games.

It’s intriguing to note that Sora’s learning resources extend beyond video games. There’s a possibility that Twitch streams could be incorporated into these resources, as the AI model appears to understand what a Twitch stream is, suggesting it may have been taught using content from this platform. An image posted by TechCrunch hints at this. Even more fascinating, this AI model produces videos starring popular Twitch personalities like Raúl Álvarez Genes (Auronplay).

TechCrunch acknowledges that the model rigorously filters content to avoid copyright conflicts. Consequently, if you ask the model to produce a video featuring a well-known brand character, it will automatically refuse. Therefore, you’ll need to use your creativity when designing prompts for the model.

Copyrighted content is AI’s bread and butter

OpenAI and Microsoft, both well-known entities, have faced copyright infringement disputes in the past, with multiple legal actions taken against them. Sam Altman, CEO of OpenAI, acknowledged that creating tools such as ChatGPT without utilizing any copyrighted material is impractical. However, Altman emphasized that current copyright law does not inherently prohibit using copyrighted content to educate AI models.

While speaking to TechCrunch, Joshua Weigensberg, an IP attorney at Pryor Cashman, indicated:

Using video game footage without proper authorization for AI training can lead to several potential issues. When developing a generative AI model, you often duplicate the training material. If this material happens to be video game playthroughs, it’s highly probable that copyrighted content is accidentally incorporated into the learning dataset.

Microsoft and OpenAI argue that they have not violated copyright laws by stating that their model’s productions are considered transformative works, not copies or plagiarisms.

Tech Reviewer and popular YouTube personality Marques Brownlee voiced significant reservations regarding Sora upon its debut, focusing on the origin of its learning resources. As a privileged user who had early access, Brownlee could explore the tool’s features. While experimenting with it, he challenged the AI by asking it to produce a video depicting a tech reviewer discussing a smartphone.

The AI-produced video captured the reviewer’s interest, particularly a plant seen on the desk within the video. He pointed out that the plant depicted in the clip bore an uncanny resemblance to the one appearing frequently in many of his previous videos.

Could my videos be found within that resource? Is it possible that the depicted plant is taken from the same resource? Or could it simply be a chance occurrence? I’m uncertain.

Although the AI-produced video doesn’t definitively prove that Sora took ideas from Brownlee’s videos, it certainly warrants a closer look and could potentially be a matter of interest.

Previously, when inquired whether model Sora draws its training data from YouTube, Instagram, and Facebook videos or images, Mira Murati (former CTO of OpenAI) didn’t offer a definitive response. Instead, she hinted that the model is educated on publicly accessible data as well as licensed content sourced from stock media platforms like Shutterstock.

The AI company failed to offer a detailed response to TechCrunch regarding their discoveries, only mentioning that they would consult their team.

Read More

2024-12-13 15:13