Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Memberships

Learn Microsoft Fabric

Public • 1.5k • Free

46 contributions to Learn Microsoft Fabric
What are your biggest pain points currently with Fabric?
Hey everyone, happy Monday! I'm currently planning out future content for the YouTube channel, and want to always produce the content that is most relevant/helpful to you! So, some questions: - What are your biggest pain points currently with Fabric? - Anything you're struggling to understand? - Things you think are important, but don't quite grasp yet? It's this kind of engagement that led to the Power BI -> Fabric series, and then the DP-600 series, so I thought I'd open it up to you again! Look forward to hearing from you - thank you! Here's some potential things: - Delta file format - integrating AI/ ML/ Azure AI / Open AI - copilot - Git integration / deployment pipelines / CI/CD - data modelling / implementing SCDs - medallion implementation - more advanced pyspark stuff - data pipelines - metadata driven workflows - dataflows (and optimising dataflows) - lakehouse achitectures - real-time - data science projects - semantic link - migrating semantic models - using python to manage semantic models - administration/ automation - fabric api - other...?
Complete action
7
37
New comment 2h ago
2 likes • 19h
Now that the Trail period is counting down, I think my ones are: - Understanding which SKU to recommend based on capacity used in a trail capacity - Cost-optimization: Incremental loading, upserts/ SCD using Notebooks, Dataflow (when available), pipelines, warehouse. Different challenges with inc. refresh, such that data that is deleted in the source system, but not marked as deleted in the API (not available in the API) - Data pipelines: Coming from the Power BI side, I assume this is the first thing to learn. Use of parameters, variables, for each etc. - Also metadata driver workflows as @Stéphane Michel mentioned - It has also been some questions regarding moving semantic models to Fabric. Practical use of CAT-tool and Tabular Editor as options
Methods for appending data from different Lakehouses
I'm going to create a solution where we are going to consolidate journal transactions from 7 clients around the world with different source systems, and create input on a standard format to a consolidation tool. In the architecture, I will have all data ingested into one Lakehouse per client in different workspaces, and create some transformations to silver layer in each workspace. So far so good. The decision is then what the best method will be to append all clients to one table, and then add some very basic logic (adding conditional columns), required by the consolidation tool. It is not a lot of data, so plan a simple overwrite to start with, and modify if needed. Considerations as I see it: - Dataflow Gen 2: Simple and also tool I know best. Also, the customer know a bit of Power Query. But not the most efficient and "elegant" solution. When adding customer, product tables etc, it will be many Queries. In future with planned release of inc refresh (Q2) and upsert (Q3), will ensure a low-code experience - Notebooks: Personal interest to learn and also most efficient (faster and less capacity usage)? Due to not very complicated logic that I assume the customer will be able to maintain once it is set up - Pipelines: Not sure how it can be possible with append and then overwrite. Also, need some logic after and would like to append and add logic in the same step. - Data warehouse as gold layer. Create a Store Procedure with the required logic. Better to use then Dataflow (can also use the query view) (?). Due to customer knowledge (and the internal team knowledge), I am leaning towards Datawarehouse method. I assume this is a good "It depends". What would be the best practice for such solution?
0
9
New comment 4d ago
2 likes • 4d
@Adeola Adeyemo thank you for the input. Not a requirement to use a Warehouse, so the only option there is it is to be solved with the stored procedure. Do not need to move it there. Agree that Notebooks would be the preferred way. Then the consideration is the customer competence, if they are going to maintain and scale it themselves. I guess it can result in increased capacity cost instead of consulting cost 😅I can tell them to go here for help 😁
1 like • 4d
@Vinayak K thanks again. Looking forward to play around with it and aim to try different alternatives, both for learning different methods and to compare performance.
In the beginning..
There was lots and lots of complex semantic models wanting to be a part of Fabric! Lol. My organization is excited to jump head first into Fabric after attending the MS Fabric Conference in Las Vegas in March. I am currently the only developer with a strong knowledge of Power BI and SQL. I model and optimize all of my complex data in the DB and utilize the star model for less intricate data. We operate using large, complex semantic models for our organizational and department dashboards. Do I need to start from scratch to get our workspace aligned in Fabric? I am wondering what would be the most efficient way to give the executives what they are looking for short term, while also building a clean foundation long term. The goal is to get all of my existing semantic models into a Lakehouse and allow end user to query or build their own departmental dashboards - managed self service model. Fabric is awesome, but overwhelming for a team of one. Any tips on how to achieve this? (Also, I will be working through all of these awesome videos and tutorials so I get certified (DP-600)!) TLDR; How to import existing semantic models into a Lakehouse?
1
6
New comment 20h ago
2 likes • 5d
I also have a small team, and the same challenge. This community is really helpful. Anyway: I went to a conference Fabric February which lower the entry point. Just get started with what you are used to. And if it is ways to improve by other methods, either improve or scale up. From Explicit measures podcast, I think it is to learn to use Pipelines for the orchestration. For myself I think it is Notebooks next on the list. I think also it is a good advice to learn it as you need it for specific tasks, not trying to follow everything. And focus on maximum 3 items, was also a good advice. For your case. The first question I would ask: Do you really need Direct Lake? If not, you can use import as before (orchestrate it with semantic model refresh activity in the pipeline), and just change the source from SQL/ warehouse to Lakehouse/Warehouse in Fabric. Where you have transformations in Power Query, you can simply copy paste into Dataflow Gen 2 (some renaming may be required as you cannot use space). If you are going the Direct Lake way, I would recommend using Tabular Editor, such that you can connect to the semantic model via this tool and copy what you need. There could be other ways like editing TMDL file/ source control, but for me this is a new area. https://data-mozart.com/migrate-existing-power-bi-semantic-models-to-direct-lake-a-step-by-step-guide/ .
V-Order
Hey All, We had good discussion around V Order in the recent monthly call. As now I also got answers to some open questions I had, so sharing all my learnings here with you all. [Long post ahead] What is V Order? - V Order is an optimization for parquet files, solely within fabric. - As Delta table holds parquet files underneath, it is applied to those parquet files as well. Idea of V Order: - It applies some additional sorting and compression on the parquet files while WRITING (consuming ~15% more time), and making the READ very fast (up to 50%) for the fabric engines (Spark, SQL, PowerBI). Key points: - Any parquet file which you "write" (not copied, not uploaded, not shortcut-ed) in fabric, will get V order optimization applied by default. - For example, if you write a parquet file using a copy data activity in data pipeline, the resulted parquet file will be v ordered. - If you write a parquet file using a spark notebook, the resulted file will be v ordered here as well. - In both the above examples, with format as delta also this holds true. How to disable it: (Screenshot attached) 1. For spark notebook, you can use spark conf. command and turn it to false. 2. For data pipelines, you can use file format settings and untick the v order option. (You will only get this option if file format is parquet) 3. For data flows, it only writes as delta table - couldn't find any option to disable it. How to check if a parquet file is v ordered or not? (Screenshot attached) - V ordered parquet files looks no different than a normal parquet file. Only difference can be seen in the metadata of the parquet file. - You can read the metadata of the parquet file using code. - Or you can also use a parquet viewer to open and read the file directly. - You will NOT find the highlighted key "com.microsoft.parquet.vorder.enabled" in the metadata of a normal parquet file.
7
11
New comment 5d ago
1 like • 5d
Thank you! Think I have not fully understood yet in which cases it is best to disable v-order.
1 like • 5d
@Vinayak K thank you for the valuable input. Has been in back of my mind to understand this setting better.
Workspace Architecture
Sorry in advance for the long post!. I saw Vinayak's post in this category and thought it might be the best place for this as well. The image attached is a draft workspace architecture proposal for an organisation, and I would appreciate any and all feedback. Of particular interest are potential issues or problems with the design you might see. Much of this is just an aggregation of ideas found in various forums and videos (including many of Will's) and I believe is fairly generic. I know a definitive architecture really isn't possible without business context, so I've listed below some of my reasoning for certain design decisions. It would be accompanied by a description of each component when proposed, hence the numerical values in the image (I won't add that here). I've also attached a simplified version showing PROD only. Please throw your thoughts at me! Why a DEV capacity? · Company policy · Separation of workload in case of engineering errors while developing Why a workspace per source? · Separation in case failure, flexible, modular and scalable · Security, lack of schema definitions in Lakehouses · Some sources may come from multiple systems where related Why Bronze and Silver in separate Lakehouse’s in the same workspace? · Security, lack of schema definitions in Lakehouses · No pipeline invocation across workspaces, so this allows orchestration to Silver layer · Some workspaces may also contain a Landing layer, depending on source · At initial ingestion, data will be landed and classified (data governance - PII, SPI) prior to moving to Silver Why Gold in separate workspace? · Some systems will be landed for historisation only, and may not require a gold layer · Our gold layer will often be a combination of multiple sources · This will be the first touch point for "prosumers" - producers of reports (developers), consumers of data. · SPI and PII classified data will require obfuscation upon landing in Gold Why Lakehouses in Bronze and Silver, Warehouses in Gold?
7
15
New comment 5d ago
0 likes • 6d
@Mark Lang I see. Valuable input. I think I will go with the different workspaces to begin with. As in my case, there are mostly different sources and clients, it can also be a benefit if they want to scale the solution in the future, not only supporting the main office.
1 like • 5d
@Mark Lang Thanks! I will use this setup as a start at least. Have not got all the data yet, but in the long term it can also be other benefits of separated workspaces, not only for security, but also capacity management. Need to get started with deployment pipelines and get more hands-on experience with the use-case to see the full impact.
1-10 of 46
Eivind Haugen
4
56points to level up
@eivind-haugen-5500
BI Consultant. Insights. Visualization, Tabulator Editor and Fabric enthusiast

Active 2h ago
Joined Mar 12, 2024
powered by