Is Microsoft’s new AI still “4x more accurate than human doctors”? — Typos in medical prompts to chatbots could be catastrophic

With AI technology advancing and reaching new peaks, going beyond just answering queries as seen in the early days of Bing Chat, it’s becoming progressively challenging for users who lack technical expertise to fully utilize these AI tools effectively.

I find it alarmingly intriguing that globally, tools such as OpenAI’s ChatGPT are swiftly gaining traction, amassing over a million new users within an hour of the launch of their latest image generator tool. This innovative creation has sparked a viral buzz among diverse social media communities, with Studio Ghibli memes being a popular testament to its widespread appeal.

As an analyst, I’ve noticed a common point of comparison between Microsoft’s offering, Copilot, and OpenAI’s ChatGPT, despite the ongoing commercial strategy differences between these tech giants. Interestingly, both platforms primarily leverage similar technology and AI models. However, recent updates indicate that Microsoft is experimenting with third-party models in Copilot and is actively working on developing unique, off-frontier models of their own.

A different analysis found that the most frequent complaint directed towards Microsoft’s AI team from users is that Copilot isn’t as effective as ChatGPT. However, Microsoft swiftly refuted this assertion, instead attributing the issue to inadequate prompt formulation on the part of the users. In response, they established Copilot Academy to enhance user skills with AI and improve overall satisfaction with tools such as Copilot.

In May, the leader of Microsoft Teams, Jeff Taper, acknowledged that Copilot and ChatGPT are highly similar, if not essentially the same. However, he emphasized that Microsoft’s version offers enhanced security features and boasts a superior user experience.

However, it seems that Microsoft might have a valid point, potentially deflecting responsibility onto the shortcomings of the prompt design, particularly if the findings from a recent study conducted by MIT researchers are accurate (as reported by Futurism).

Be “WEARY” of typos when using AI

The research underscores that excessive reliance on AI for healthcare guidance may be risky and occasionally misguiding. Moreover, the findings suggest that AI tools might advise users to avoid medical help if there are typos or errors in their queries, such as misspelled words, extra spaces, or incorrect language use. In this situation, colloquial speech and slang are potential warning signs.

Researchers added an important caveat to their findings, stating that women might be more vulnerable than men to being deceived by misleading AI-generated advice. However, it’s essential to approach this information with some skepticism. The study focused on several AI tools, including OpenAI’s GPT-4, Meta’s LLama-3-70b, and a medical AI called Palmyra-Med.

In a simulation, they utilized various health scenarios drawn from a medical database, Reddit health discussions, and artificially generated cases numbering in the thousands.

It’s intriguing that the researchers chose to incorporate discrepancies into the data as a strategy to challenge the AI models. They did this by using various methods such as inconsistent capitalization at the start of sentences, exclamation marks, colloquial language, and expressions like “perhaps” or “it’s possible.

It appears that the chatbots were deceived by the trick, causing them to adjust their views and medical recommendations. According to the study, the disturbances increased the likelihood of the chatbot suggesting a patient avoid going to the hospital by 7-9%.

Researchers found that AI systems predominantly depend on the quality of their medical training data. Consequently, they sometimes struggle to interpret patient-shared information effectively, as it often lacks the coherence and structure typically found in medical publications.

According to the study’s lead author and researcher at MIT, Abinitha Gourabathina:

While these models are frequently honed on medical examination queries, they’re sometimes employed for tasks quite distinct, such as assessing the gravity of a clinical situation. There remains a great deal to discover about the capabilities and limitations of Language Learning Models.

The research results highlight significant apprehensions regarding the adoption of AI in healthcare, following Microsoft’s announcement about a groundbreaking AI medical tool that is reportedly 4 times more precise and 20% cost-effective compared to human doctors. The CEO of Microsoft’s AI division described it as an “important stride towards medical super intelligence.

It appears that generative AI still needs significant advancement before it can be fully relied upon for intricate areas such as medicine, given the current state of development.

Read More

2025-07-10 14:09