Please help if you can. I have been running pyspark notebooks in fabric that load data from a azure dedicated sql pool to a fabric lakehouse . I am using spark.read.option as shown below. Code using this access method has been running successfully for over a month. I was using it to load our new Enterprise Data Warehouse and it performed flawlessly. It quit working Monday afternoon 5/20 central US. from pyspark.sql.functions import col
.option(Constants.USER, "myfabriclogin")
.option(Constants.PASSWORD, "myfabricloginpassword")
.option(Constants.DATABASE, "dw")
.option(Constants.QUERY, "select * from dw.Dealer") .synapsesql()
)
df.count()
I have changed all the values of course for server, login, password, and data source.
The code errors out at the df.count() statement, I believe this is because spark uses lazy evaluation and that is where it actually has to have the data.
The head of the exact error message is here:
Py4JJavaError: An error occurred while calling o4666.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 13.0 failed 4 times, most recent failure: Lost task 6.3 in stage 13.0 (TID 38) (vm-e1e05730 executor 1): java.lang.NullPointerException
The last part of the error message references a null pointer, I think the data frame is empty. If I try to do a "head" on the data frame I get the same sort of error. I am able to login directly to the sql dedicated pool using SSMS with the user, password, and database shown above and everything works.
A word about the "DATA_SOURCE", this is used to setup a temporary holding place in ADLS2 for data being imported from the dedicated pool. The data source exists as an "EXTERNAL DATA SOURCE" within the dedicated sql pool. It uses a database scoped credential also created on the dedicated sql pool. The database scoped credential is designed to be a temporary access method. This failure occurred exactly two months after we initially started using this. I thought this was the problem but creating a new credential and a new data store did not fix the issue, it did however slightly change the error messages.
Sorry for the very long post, I would greatly appreciate any new ideas, as I am all out of them.
I originally got the idea to do this from the following youtube video
Microsoft Fabric: Import Azure Synapse Dedicated Pool FAST via Spark Notebook!