Hello everyone, i am dealing with big data, where i have delta table that has about 900 billion rows (and still going up). The delta table is partitioned by Date and Hour. 1 hour normally have around 30-60 million rows.
My first attempt to visualize a one minute data, around 1-2 million record failed in power bi , simply because when i am applying a filter dynamically in direct query mode, power bi, uses sql endpoint when connection to datasource in fabric lakehouse, and for some reason SQL endpoints are not using partition pruning. So my power bi report is going through the whole table, and then power bi time out without returning any visual.
My second attempt was to do the visualization the notebook itself to take advantage of the pyspark engine, that can quickly return the dataframe i am obtaining when filtering for a specific minute for example. I have tried to use plotly, bokeh, holoviews... but they are all giving a message that the output is too large, and i should write to a file. writing to a file can't keep the interactivity that i am looking for.
What i am looking is either a new visualization library that i can use from notebook?