The release of Perplexity, OpenAI and Google's Deep Research capability (discussed last week in data pro.news), is proving to be significant productivity enhancement for Data Professionals. To give you a sense of its power I set it to work to troll reddit to see: What are the top 10 most discussed Data Engineering Topics in Online Communities?
And the winner is...
1. ETL/ELT Processes and Modern Tooling
The shift from traditional ETL to ELT frameworks, enabled by tools like dbt and Apache Airflow, remains a cornerstone of data engineering discussions. Communities emphasize the importance of modular pipeline design, idempotency, and partitioning strategies to handle large-scale data transformations. The rise of dbt as a transformation layer for SQL-centric workflows has sparked debates about its limitations in complex orchestration scenarios. Meanwhile, Apache Spark continues to dominate batch and stream processing use cases, with PySpark adoption outpacing Scala.