Hi everyone,
I just conducted a test and would like to hear your opinions. I created two distinct pipelines: one using Dataflow (PowerQuery) and the other using notebooks with Spark. The task is quite simple, but I am using two large tables in this process. Basically, I perform a join (merge) and expand the columns, unifying the two tables.
The problem is that this simple process takes 30 minutes to run in Dataflow, while it only takes 3 minutes using notebooks. My question is: if we consider that an analyst has the option to use either Dataflow or notebooks to build this pipeline, I understand that the performance of notebooks will always be better. Is my assumption correct? What is your opinion?
Thank you.