Hallucinations are a predictable byproduct of the way models are tested and rewarded during training. Models guess rather than admit uncertainty because testing in training rewards accuracy - not honesty.
This report is a milestone because it shifts hallucinations from a mystical flaw that's something inevitable and poorly understood to a solvable engineering problem. It means LLMs can be trained to say "I don’t know", if, and only if, we reward them for it.
Now that we know why, 𝘸𝘦 𝘤𝘢𝘯 𝘢𝘤𝘵𝘶𝘢𝘭𝘭𝘺 𝘥𝘰 𝘴𝘰𝘮𝘦𝘵𝘩𝘪𝘯𝘨 𝘢𝘣𝘰𝘶𝘵 𝘪𝘵.