Activity
Mon
Wed
Fri
Sun
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
What is this?
Less
More

Memberships

Data Innovators Exchange

621 members • Free

Learn Microsoft Fabric

14.9k members • Free

1 contribution to Learn Microsoft Fabric
Load Excel file to Files area of lakehouse
I am using a notebook to load an Excel file (downloaded from a website) into a folder in the Files area of a lakehouse - I thought this would be pretty straightforward, but I must be missing something: from datetime import datetime import pandas as pd url = "https://<url_of_excel_file" output_path = "Files/sales_targets/" + datetime.now().strftime("%Y%m%d") # load Excel file from URL and replace spaces in column names df = pd.read_excel(url) df.columns = df.columns.str.replace(' ','') # create directory if it doesn't exist mssparkutils.fs.mkdirs(output_path) df.to_excel(output_path + "/targets.xlsx") Is df.to_excel the correct method here, or should I be using PySpark instead?
0 likes • Feb '25
you could try two things: 1. Use a pipeline with the Copy Activity and Connection REST (GET). Source is the URL for every file (might need some for pipeline activities to iterate the urls). Destination is the Folder in your Lakehouse. Then you can access it via notebooks. 2. Use a python notebook (no heavy spark needed) and use the temporary file system of the notebook session. Most standard python packages with filesystem functions won't support abfs paths directly. So you can use notebookutils.fs.cp to copy files out of the sessions filesystem to a Lakehouse files folder: # store to session filesystem with normal pandas tmp_path = "/tmp/myfile.xlsx" sales_df.to_excel(tmp_path) # copy from session to lakehouse workspace_id = notebookutils.runtime.context["currentWorkspaceId"] lh = notebookutils.lakehouse.get("YourLakehouse", workspaceId=workspace_id) lh_folder = f"{lh.properties['abfsPath']}/Files/My/Folder" notebookutils.fs.mkdirs(lh_folder) lh_path = f"{lh_folder}/myfile.xlsx" notebookutils.fs.cp(f"file:{tmp_path}", lh_path) While this might not be the best approach for data that could easily be saved as delta table, as described by Anthony above, its helpful if you need to use python packages that output other data that needs to be a file but the package filesystem methods don't support abfs. In my case it was with geopandas and topojson and I needed the plain json output. if you want to write to a table (delta), using a lightweight python notebook is usually a better way than spinning up the heavy spark cannon: %pip install pandas deltalake pyarrow import pandas as pd from deltalake.writer import write_deltalake your_df = pd.from_excel("...") write_deltalake(f"{lh.properties['abfsPath']}/Tables/my_table", your_df, mode="overwrite")
1-1 of 1
Patrick-Oliver Groß
1
5points to level up
@patrick-oliver-gro-5855
Full Stack Data Analytics

Active 19h ago
Joined Jun 17, 2024
Germany
Powered by