📰 AI News: The Default ChatGPT Model Just Got Significantly Better at Health Questions 📰
📝 TL;DR 📝 OpenAI says GPT-5.5 Instant, the default model used by hundreds of millions of free ChatGPT users, now performs on par with its top-tier frontier models on health-related questions. This was built with hundreds of physicians across 60 countries and shows a measurable drop in factual errors. No setup required, this is a quality upgrade to the model most people are already using. 🧠 Overview 🧠 This is not a new feature to turn on or a product to try. It is a quality improvement to the default model that powers ChatGPT for free users, and it specifically targets one of the most common and highest-stakes use cases: health and wellness questions. OpenAI reports that more than 230 million people ask ChatGPT health-related questions every week, covering things like interpreting lab results, preparing for doctor's appointments, and navigating insurance questions. Because health misinformation carries real consequences, the credibility of this kind of improvement depends heavily on how it was tested. OpenAI built this update in collaboration with a large physician network and ran extensive comparative evaluations, though it is worth noting all of the cited results come from OpenAI's own benchmarks and physician panel rather than independent or peer-reviewed verification. 📜 The Announcement 📜 OpenAI announced that GPT-5.5 Instant, released in May, now reaches health performance comparable to its frontier Thinking models on an aggregate of health evaluations, a substantial improvement over the previous GPT-5.3 Instant. The company worked with a network of over 260 physicians across 60 countries, 49 languages, and 26 specialties, who have collectively reviewed more than 700,000 example model responses. In a direct comparison, a panel of physicians evaluated 3,500 health-related responses and rated GPT-5.5 Instant higher than both older AI models and physician-written answers on accuracy, communication, completeness, and overall health decision helpfulness. Separately, monitoring of real-world production traffic showed a 71 percent drop in health responses flagged for potential factual issues over the past two months.