Everytime people ask me about AI quantity take-off tools, I refer them to the Clock Benchmark
It's a measure of how well the top AI tools can read analogue clocks
Claude Opus 4.6 (widely regarded as the most intelligent model) scores 10%
Gemini 3.1 gets 30%
The reality is these are language models. And visual reasoning is really quite tough
I think there is some role for AI in QTOs, like extracting schedules, or simple counts. But measuring m2 of formwork on complex structures
Not quite there yet....