Interpretability— forks of useful understanding
Interpretability— forks of useful understanding Critics of AI progress, notably Gary Marcus, are out in force and loudly, essentially arguing that barriers of interpretability, defined loosely as a lack of understanding of how an AI model actually works, and a corresponding lack of observability, defined obviously as we can’t see them working, are hampering further progress towards AGI. Simultaneously, voices more sanguine about progress, fork the interpretability question orthogonally, posing that the risk isn’t to investment, capital, and progress, but in safety and guardrails. That is captured in a paper by AI scientist Dario Amodei. Continue this discussion (and see syntheses of Marcus and Amodei papers) over at nov. link/skoolAI (remove the space) In a way, both represent existential threats to the current AI moment, if you accept the framing. Marcus pours water on reasoning models, suggesting that they are simply mimicking observed thought patterns from the training data, and are hard-bounded by that. Perhaps, but essentially synthetic reasoning data can provide new untrained thinking patterns, validated by reinforcement learning. (Which is what you are doing when you respond to a prompt, and say, “Hey, you screwed that up”— but also done at a tremendous scale via Mixtures of Experts (MoE), etc.) Amodei, conversely, is arguing that interpretability is essential to being able to restrain a runaway super AI. Here’s his framing: “Modern generative AI systems are opaque in a way that fundamentally differs from traditional software. If an ordinary software program does something—for example, a character in a video game says a line of dialogue, or my food delivery app allows me to tip my driver—it does those things because a human specifically programmed them in. Generative AI is not like that at all. When a generative AI system does something, like summarize a financial document, we have no idea, at a specific or precise level, why it makes the choices it does—why it chooses certain words over others, or why it occasionally makes a mistake despite usually being accurate.”