V-Order · Learn Microsoft Fabric

V-Order

Hey All, We had good discussion around V Order in the recent monthly call. As now I also got answers to some open questions I had, so sharing all my learnings here with you all. [Long post ahead]

What is V Order?

V Order is an optimization for parquet files, solely within fabric.
As Delta table holds parquet files underneath, it is applied to those parquet files as well.

Idea of V Order:

It applies some additional sorting and compression on the parquet files while WRITING (consuming ~15% more time), and making the READ very fast (up to 50%) for the fabric engines (Spark, SQL, PowerBI).

Key points:

Any parquet file which you "write" (not copied, not uploaded, not shortcut-ed) in fabric, will get V order optimization applied by default.
For example, if you write a parquet file using a copy data activity in data pipeline, the resulted parquet file will be v ordered.
If you write a parquet file using a spark notebook, the resulted file will be v ordered here as well.
In both the above examples, with format as delta also this holds true.

How to disable it: (Screenshot attached)

For spark notebook, you can use spark conf. command and turn it to false.
For data pipelines, you can use file format settings and untick the v order option. (You will only get this option if file format is parquet)
For data flows, it only writes as delta table - couldn't find any option to disable it.

How to check if a parquet file is v ordered or not? (Screenshot attached)

V ordered parquet files looks no different than a normal parquet file. Only difference can be seen in the metadata of the parquet file.
You can read the metadata of the parquet file using code.
Or you can also use a parquet viewer to open and read the file directly.
You will NOT find the highlighted key "com.microsoft.parquet.vorder.enabled" in the metadata of a normal parquet file.

PS: These are just based on my findings, please correct me in case of inaccuracies😊

11 comments