Elon Musk's Grok 3: The AI That Didn't Live Up to the Hype!

Following a great deal of excitement and buzz surrounding Grok 3 by xAI, the next-generation model has officially been released to the public. The CEO and billionaire Elon Musk boasts that it is the “most intelligent AI on the planet,” asserting its superiority over proprietary models from leading AI companies such as OpenAI, Anthropic, DeepSeek, and Google across various metrics, including mathematics, science, and programming.

The enhanced functionality might be due to Elon Musk stating that Grok 3 has ten times more computational power compared to its previous version, as he announced during the unveiling of the product on platform X (previously known as Twitter).

“Grok 3 possesses about ten times the capability of Grok 2… [It’s an] AI that prioritizes truth above all else, even when this truth might contradict commonly accepted beliefs or political correctness.”

As an avid enthusiast, I can confidently say that we’re consistently refining our models day by day, and you’ll notice these enhancements in just about 24 hours. What’s intriguing is how Grok 3 outperforms OpenAI’s GPT-4o on multiple tests, such as the AIME, which assesses a model’s mathematical prowess, and GPQA, which examines its scientific capabilities.

In a recent sharing of insights, Andrej Karpathy (co-founder of OpenAI and former Tesla AI lead) discussed the performance of Grok 3.

In a brief assessment this morning, it appears that Grok 3 + Thinking is approaching the cutting edge of performance among OpenAI’s top models (such as o1-pro, priced at $200 per month), slightly surpassing DeepSeek-R1 and Gemini 2.0 Flash Thinking. This is remarkable given that the team has been working from scratch for just a year, making this rapid progress towards state-of-the-art performance quite unprecedented. However, remember the caveats: these models are probabilistic, meaning they may provide slightly different responses each time, and it’s early days, so more evaluations over the coming days/weeks are needed. Initial results in the Language Model arena are very promising indeed. At this point, a big congratulations to the xAI team! They seem to have built up significant speed and momentum, and I look forward to including Grok 3 in my “LLM council” to see its insights moving forward.

Everything you need to know about Grok 3

The model named Grok 3, which was trained at xAI’s Memphis data center equipped with 200,000 GPUs, received better scores than its counterparts in the Chatbot Arena, a platform that relies on a large group of people to evaluate and compare various AI models.

As an analyst, I work with a tool called Grok, which boasts two operational modes: Think and Big Brain. The basic mode, Think, is ideal for addressing general inquiries, while the advanced mode, Big Brain, takes over for complex queries, leveraging its additional compute resources to delve deeper into intricate reasoning processes.

As stated by xAI, Grok 3 Reasoning and Grok 3 mini Reasoning possess the ability to tackle problems and make decisions similar to OpenAI’s o3-mini or DeepSeek’s R1 AI. Additionally, this tool comes equipped with a novel DeepSearch feature that enhances research, brainstorming, and data analysis when answering questions, mirroring the capabilities of OpenAI’s Deep Search and Perplexity DeepResearch.

3 versions of Grok have been launched for users on the Premium+ subscription level. Of interest is the upcoming subscription option, SuperGrok, which will offer unique access to DeepSearch, enhanced analytical skills, and limitless image creation.

To that end, Elon Musk plans to open-source Grok 2 in the next few months:

“We plan to make the latest version of Grok publicly available once the upcoming version, Grok 3, has been completely released. Given that Grok 3 should be mature and stable in just a couple of months, we’ll then release the source code for Grok 2.”

It’s worth noting that although Elon Musk has claimed so, Ethan Mollick, an associate professor at the University of Pennsylvania’s Wharton School, suggests that Grok 3 isn’t particularly prominent in the field of AI.

X has caught up with the frontier of released models VERY quickly, if they continue to scale this fast, they are a major player. That said, while their base model is currently leading the Chatbot Arena, their benchmarks are not clearly beating OpenAI’s o3
Grok 3 is closely following the OpenAI playbook, including using the same product mix
Not sure whether firms will use the Grok API at this point, given the enterprise partnerships (Azure, AWS, etc.), support and extensive sales & training efforts for the other big labs, I don’t know if Grok has a big opening.

Although the efficiency of Grok 3 versus OpenAI model o3 is still up for discussion, Gary Marcus, the founder of Geometric Intelligence, has expressed his viewpoint through Business Insider: [Grok 3‘s performance against OpenAI o3 is questionable], according to him.

“Elon Musk promised that Grok 3 would be the smartest AI ever. Spoiler alert: it wasn’t.”

Marcus commented that Grok 3’s launch was a duplicate of earlier demonstrations. He further noted that although the model exhibits potential, it hasn’t scaled to match OpenAI’s models’ performance levels yet. “For now, Sam Altman can relax,” he added, implying there wasn’t any significant advancement in this release.

2025-02-18 21:39

Elon Musk’s Grok 3: The AI That Didn’t Live Up to the Hype!

Everything you need to know about Grok 3

Read More