I have attached a diagram showing the architecture I am trying to set up. We are receiving daily CSV files (everyday) and we have a collection of 2 years worth of such files. We want to ingest this data to Power BI and create reports.
Ideally, I am looking at a solution that can transmit daily CSVs from on-prem network and make them append to their respective tables in Fabric Lakehouse. Is this possible?
What are the possible choices available to implement the architecture below? I have numbered the key data transmission tasks.
1) On-prem to Fabric Lakehouse: What is the common pattern to load on-prem data files to Fabric Lakehouse? Push from on-prem or pull from Fabric? If it is the latter, I assume Power BI gateway is required.
Can the daily CSVs be appended to the tables in Fabric Lakehouse? Is it possible to incrementally refresh the Fabric semantic model from CSV files?
2 & 3 vs 4) Option 4 is Direct Lake. If Direct Lake is not available, then 2 & 3 are required since a semantic model is required for Power BI reporting. How do I implement Lakehouse to Semantic model daily data refreshing? What are the choices (eg Notebooks vs Data Flows Gen 2 vs ADF Pipeline Copy task or something else?) available for setting up daily data ingestion tasks 2 and 4?