Qwen 3.6 Pro looks really impressive. Compared to Opus 4.5 (not 4.6, it is the previous model so let's not create fake AI hype), it beats it by quite a few points on many different benchmarks (first 2 images attached).
I do feel like something went on there to make sure they score high, either fine tuning or only using the results in which Qwen won, not too sure though.
One feature I do like is the extended memory, so you know how sometimes you have a medium sized prompt mentioning a few features but 3 messages in the AI forgets about the database or something? Looks like this model combats that really well, here's what they said:
- preserve_thinking: Preserve thinking content from all preceding turns in messages. Recommended for agentic tasks. This capability is particularly beneficial for agent scenarios, where maintaining full reasoning context can enhance decision consistency and, in many cases, reduce overall token consumption by minimizing redundant reasoning. This feature is disabled by default, i.e., preserve_thinking defaults to false, meaning the thinking content in preceding turns are discarded, and only the thinking content generated in handling the latest user message is kept (interleaved thinking).
I also saw a live use case of a build with this model and it was genuinely shocking, a whole SaaS, although not perfect, was made in 39k tokens only, 35 minutes and 4% of it's capacity. Here's the video 1 million context length as well