📰 AI News: Google launches Gemma Scope 2 to “x-ray” how AI models think
📝 TL;DR Google DeepMind just released Gemma Scope 2, a giant open toolkit that lets researchers peek inside how its Gemma 3 models actually think. It is being called the largest open interpretability release so far, focused on making powerful AI models more transparent and safer to use. 🧠 Overview Gemma Scope 2 is a new suite of tools that works across the full Gemma 3 model family, from tiny 270M models up to 27B parameters. It lets researchers and safety teams inspect what is going on in the “brain” of the model, not just its final answer on screen. The goal is to help the AI safety community understand complex behaviors like jailbreaks, scams, or hidden reasoning so we can build more trustworthy AI systems. 📜 The Announcement Google DeepMind announced Gemma Scope 2 as an open, comprehensive interpretability suite for Gemma 3. They describe it as the largest open source style release of interpretability tools from any AI lab so far, built using a huge amount of data and trained parameters. The release includes model artifacts, documentation, and interactive demos so external researchers can study safety relevant behaviors in modern language models. ⚙️ How It Works • Think of it as an AI microscope - Gemma Scope 2 uses sparse autoencoders and “transcoders” to break down internal activations into human interpretable features, so you can see what concepts a model is focusing on as it responds. • Full coverage for Gemma 3 - The tools cover every layer of all Gemma 3 model sizes, which matters because many weird or dangerous behaviors only show up in larger models. • New training tricks - It uses advanced methods like Matryoshka style training so the features it finds are cleaner and more meaningful, improving on the first Gemma Scope release. • Chatbot behavior analysis - There are special tools tuned for chat models so you can inspect things like jailbreak attempts, refusal behavior, and whether the model’s inner reasoning matches the explanation it gives you.