Patrick-Oliver Groß

Learn Microsoft Fabric

Activity

Mon

Wed

Fri

Sun

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

What is this?

Less

Memberships

Data Innovators Exchange

621 members • Free

Learn Microsoft Fabric

14.9k members • Free

1 contribution to Learn Microsoft Fabric

James Coulter

Feb '25 •

🤓 Tech Chat

Load Excel file to Files area of lakehouse

I am using a notebook to load an Excel file (downloaded from a website) into a folder in the Files area of a lakehouse - I thought this would be pretty straightforward, but I must be missing something: from datetime import datetime import pandas as pd url = "https://<url_of_excel_file" output_path = "Files/sales_targets/" + datetime.now().strftime("%Y%m%d") # load Excel file from URL and replace spaces in column names df = pd.read_excel(url) df.columns = df.columns.str.replace(' ','') # create directory if it doesn't exist mssparkutils.fs.mkdirs(output_path) df.to_excel(output_path + "/targets.xlsx") Is df.to_excel the correct method here, or should I be using PySpark instead?

New comment Feb '25

Patrick-Oliver Groß

0 likes • Feb '25

you could try two things: 1. Use a pipeline with the Copy Activity and Connection REST (GET). Source is the URL for every file (might need some for pipeline activities to iterate the urls). Destination is the Folder in your Lakehouse. Then you can access it via notebooks. 2. Use a python notebook (no heavy spark needed) and use the temporary file system of the notebook session. Most standard python packages with filesystem functions won't support abfs paths directly. So you can use notebookutils.fs.cp to copy files out of the sessions filesystem to a Lakehouse files folder: # store to session filesystem with normal pandas tmp_path = "/tmp/myfile.xlsx" sales_df.to_excel(tmp_path) # copy from session to lakehouse workspace_id = notebookutils.runtime.context["currentWorkspaceId"] lh = notebookutils.lakehouse.get("YourLakehouse", workspaceId=workspace_id) lh_folder = f"{lh.properties['abfsPath']}/Files/My/Folder" notebookutils.fs.mkdirs(lh_folder) lh_path = f"{lh_folder}/myfile.xlsx" notebookutils.fs.cp(f"file:{tmp_path}", lh_path) While this might not be the best approach for data that could easily be saved as delta table, as described by Anthony above, its helpful if you need to use python packages that output other data that needs to be a file but the package filesystem methods don't support abfs. In my case it was with geopandas and topojson and I needed the plain json output. if you want to write to a table (delta), using a lightweight python notebook is usually a better way than spinning up the heavy spark cannon: %pip install pandas deltalake pyarrow import pandas as pd from deltalake.writer import write_deltalake your_df = pd.from_excel("...") write_deltalake(f"{lh.properties['abfsPath']}/Tables/my_table", your_df, mode="overwrite")

1-1 of 1

Level 1

5points to level up

Patrick-Oliver Groß

@patrick-oliver-gro-5855

Full Stack Data Analytics

Active 19h ago

Joined Jun 17, 2024

Germany

Contributions

Followers

Following