Elon Musk’s Grok 2 might not be “the most powerful AI,” but it outperforms Anthropic’s Claude 3.5 Sonnet and even GPT-4-Turbo

Elon Musk's Grok 2 might not be "the most powerful AI," but it outperforms Anthropic's Claude 3.5 Sonnet and even GPT-4-Turbo

What you need to know

  • X recently announced the release of Grok-2, Grok-1.5’s successor.
  • The new model ships with state-of-the-art frontier capabilities in chat, coding, and reasoning.
  • Grok is also getting a new user interface and image generation capabilities.

As an experienced observer of the tech industry, I must say that the release of Grok-2 by X is quite intriguing. Having followed the AI landscape for several years, I’ve seen promising models come and go, each promising to revolutionize the way we interact with technology.


Based on Elon Musk’s statements that Grok would be “the most powerful AI in every aspect by December,” it was almost guaranteed we’d see a new version soon. Now, X Corporation has unveiled an early sneak peek at Grok-2, featuring cutting-edge reasoning abilities.

As per the company’s statement, the latest model represents a substantial advancement over its predecessor, the Grok-1.5, boasting cutting-edge features in chat interactions, programming, and logical thinking.

Alongside Grok-2’s release, X also introduces Grok 2-mini, “a small but capable sibling of Grok-2.” Despite X’s huge user base, Grok is arguably less popular than its counterparts in the AI space, such as Microsoft Copilot or OpenAI’s ChatGPT. Interestingly, per the company’s benchmarks and the LMSYS leaderboard under the name “sus-column-r,” Grok-2 outperforms Claude 3.5 Sonnet and GPT-4-Turbo.

Can Grok grow without X data?

Elon Musk's Grok 2 might not be "the most powerful AI," but it outperforms Anthropic's Claude 3.5 Sonnet and even GPT-4-Turbo

Models like Grok-2 and its mini version have demonstrated exceptional skills across various domains, including advanced scientific concepts akin to graduate-level studies (GPQA), general knowledge (as tested by MMLU and MMLU-Pro), as well as mathematical challenges similar to those found in competitions (MATH). Moreover, these models are proficient at tackling vision-related tasks.

The model is being updated with a new layer of paint for an enhanced user interaction, and it’s worth mentioning that users will soon be able to create images based on prompts directly through the chatbot, acting as an Image Creator. Unfortunately, it appears that Designer, DALL-E 3, and similar platforms may have been limited due to extensive censorship measures.

As a tech enthusiast, I find myself in a thought-provoking predicament concerning X. It seems they might be teetering on the edge of a potential 4% annual loss of their global revenue, due to an alleged practice of clandestine training for Grok, utilizing data from 60 million EU users, without explicit consent. Time will tell how Grok fares in the absence of user-derived training.

Read More

2024-08-15 12:47