Efficient Data Loading Strategy in lakehouse
Hello everyone,
I’m working with a large dataset spanning from 2000 to 2024, stored in .CSV format.
The data from 2000 to 2023 is static and will not change, while the data for 2024 will be updated daily.
I’m considering whether to:
  • Create two separate tables in the Bronze layer for the historical (2000-2023) and the 2024 data, and then merge them in the Silver layer.
  • Save the historical data as Parquet files and load the CSV only for 2024 updates.
Additionally, I’m using Microsoft Fabric notebook for this task and wondering about the best practices for optimization. Should I rely on automatic optimization features, or should I schedule OPTIMIZE commands manually?
Any insights or experiences with similar data loading and optimization strategies would be greatly appreciated!
Thanks!
3
5 comments
Stéphane Michel
5
Efficient Data Loading Strategy in lakehouse
Learn Microsoft Fabric
skool.com/microsoft-fabric
Helping passionate analysts, data engineers, data scientists (& more) to advance their careers on the Microsoft Fabric platform.
Leaderboard (30-day)
Powered by