Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Memberships

Modern Data Community

Public • 573 • Free

Software Developer Academy

Private • 17k • Free

Data Alchemy

Public • 14.7k • Free

Upwork Mastery

Private • 245 • Free

4 contributions to Modern Data Community
How do you handle CDC ?
Hey, the data community! I'm currently working on a project that involves using Change Data Capture (CDC) for incremental loading from a MySQL database. I've come across Airbyte, which seems to offer a feature for this purpose. Does anybody have experience with CDC using Airbyte? Alternatively, are there other solutions you know of to deal with this?
4
2
New comment Mar 19
What's your Data Stack?
It's one thing to read articles or watch videos about perfectly crafted data architectures and think you're way behind. But back here in reality, things get messy & nothing is ever perfect or 100% done. Most of us are usually working on architectures that are: - Old & outdated - Hacked together - Mid-migration to new tools - Non-existent Or perhaps you're one of the lucky ones that recently started from scratch and things are running smoothly. Regardless, the best way to learn what's working (and not working) is from others. I believe this could be one of the best insights this community can collectively offer each other. So let's hear it. What does your data stack look like for the following components? 1. Database/Storage 2. Ingestion 3. Transformation 4. Version Control 5. Automation Feel free to add other items as well outside of these 5, but we can focus on these to keep it organized.
8
65
New comment 17d ago
2 likes • Feb 20
Given all the responses, it seems there are two teams in data engineering: Team Python and Spark for transformation with a data lake/lake house And the team SQL and DBT with a data warehouse. However, it seems like DBT has gained the advantage in the fight. Therefore, is it still worth learning Spark?
Here is how I designed my new Data Warehouse
Hey everyone! Here is another update from my journey building analytics in company from scratch. This time let's talk about data warehouse design and topology. Specifically, I wanna share how I designed my databases, schemas and user roles. Buckle up! As you may know, I've chosen Snowflake as the data warehouse solution. And what I like about it is how flexible it can be in terms of architecting the desired solution. You will see why in a moment. First, I started with databases and schemas. In my setup there will be 4 databases: - RAW - for storing raw data from integrations and data lake. - ANALYTICS - a database with production models. - DBT_DEV - a database for dbt development. - SANDBOX - a playground database for any ad-hoc tables. ## RAW database Each schema within this database will follow a `{source}__{connector}` patters, so that it's always clear how the data was ingested (e.g. stripe__airflow, mongo__dagster, etc). ## ANALYTICS database For now only dbt is going to use this database, but in the future I see other transformation tools are going to be using this database. There will be several schemas: - `STAGING` -- for staging raw models - `INTEREMEDIATES` -- for int models (see this dbt guide for explanations) - separate schema for each business domain, e.g. `CORE`, `FINANCE`, `PRODUCT`, `MARKETING`, etc. ## DBT_DEV Every developer will have prefixed schemas within this database, and schemas should reflect the structure of a production database, e.g `OLEG_STAGING`, `OLEG_FINANCE`, etc. This way every analyst/developer is going to have an isolated space for their model development. ## SANDBOX Every developer are going to have their own schema for ad-hoc and temporary tables and view. To me, it should be sufficient structure to start working with data and deliver actual insights to the business.
12
13
New comment Feb 19
0 likes • Feb 19
@Abdel Uq Excildraw
Which tools do you use for streaming data pipeline?
Hey, data community! As a Data Engineer, do you frequently work with streaming data? If yes, which tools do you use to ingest streaming data? And which ones do you use to process that data?
1
6
New comment Feb 9
0 likes • Feb 6
Thanks for your response, I'm thinking of learning more about streaming data pipelines and wondering if they're commonly used in “real world”
0 likes • Feb 9
@Emile Van Der Heyde Thanks for your response!!
1-4 of 4
Dorian Teffo
1
1point to level up
@dorian-teffo-1774
Data engineer

Active 1d ago
Joined Jan 28, 2024
powered by