Today, I decided to make a small benchmark between these 3 tools
🎯 Benchmark setup
2 data volumes: 100k and 5M rows
3 scenarios:
Simple
Intermediate (adding columns, filters, etc.)
Advanced (joins, complex calculated columns, window functions, indexes…)
3 tools compared:
Dataflow Gen2
Spark notebook
T‑SQL in the warehouse
The detailed results are shown in the attached chart.
📌 Key takeaways
1) T‑SQL is the clear winner
Across all scenarios, it’s either the fastest or very close to the top.
Even for complex transformations (window functions, aggregations, joins), the Fabric SQL engine is extremely efficient.
2) Dataflow Gen2: very sensitive to volume
100k rows: 35–41 s → acceptable.
5M rows: 39–105 s → execution time explodes.
3) Spark notebook: stable but never first
Execution time is fairly consistent (~40–53 s), but it doesn’t dominate the advanced 5M‑row scenario like I expected.
It’s also penalized by ~10 s of cluster startup…
4) Scale completely changes the picture
At 100k rows, everything looks “fast” (3–41 s).
At 5M rows, the gap grows to 1× vs 6× between tools.
5) CU consumption: similar
Across the tests, all three options sit in a similar range of about 9–11 CU for the workloads.
The big difference: if you don’t stop the Spark notebook, it keeps consuming CUs at the same level even when it’s just “waiting”, whereas T‑SQL and Dataflow Gen2 stop consuming once the workload is done.
My personal verdict
Strong SQL team → T‑SQL
Fastest, most stable, and very competitive even for complex workloads.
Need low‑code / Power Query team → Dataflow Gen2
Great tool, but you really need to watch Query Folding carefully.
Need predictable scalability / data engineering use cases → Spark notebook
Stable performance, ideal when you want to avoid surprises.
Any of you already make this kind of benchmark ? What was your results ?
Ps: if you want to support me, do not hesistate to share or like my linkedin post ;-)