What are the top 10 most discussed Data Engineering Topics Online?
The release of Perplexity, OpenAI and Google's Deep Research capability (discussed last week in data pro.news), is proving to be significant productivity enhancement for Data Professionals. To give you a sense of its power I set it to work to troll reddit to see: What are the top 10 most discussed Data Engineering Topics in Online Communities? And the winner is... 1. ETL/ELT Processes and Modern Tooling The shift from traditional ETL to ELT frameworks, enabled by tools like dbt and Apache Airflow, remains a cornerstone of data engineering discussions. Communities emphasize the importance of modular pipeline design, idempotency, and partitioning strategies to handle large-scale data transformations. The rise of dbt as a transformation layer for SQL-centric workflows has sparked debates about its limitations in complex orchestration scenarios. Meanwhile, Apache Spark continues to dominate batch and stream processing use cases, with PySpark adoption outpacing Scala. The full list of the top 10 is published here https://www.perplexity.ai/page/top-10-most-frequently-discuss-vwndmAbdQVC_ANozGkGP9A