Dorian Teffo

Modern Data Community

Activity

Mon

Wed

Fri

Sun

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

What is this?

Less

Memberships

Modern Data Community

Public • 573 • Free

Software Developer Academy

Private • 17k • Free

Data Alchemy

Public • 14.7k • Free

Upwork Mastery

Private • 245 • Free

4 contributions to Modern Data Community

Dorian Teffo

Mar 11

🧱 | Strategy & Design

How do you handle CDC ?

Hey, the data community! I'm currently working on a project that involves using Change Data Capture (CDC) for incremental loading from a MySQL database. I've come across Airbyte, which seems to offer a feature for this purpose. Does anybody have experience with CDC using Airbyte? Alternatively, are there other solutions you know of to deal with this?

New comment Mar 19

Michael Kahan

Dec '23

🧱 | Strategy & Design

What's your Data Stack?

It's one thing to read articles or watch videos about perfectly crafted data architectures and think you're way behind. But back here in reality, things get messy & nothing is ever perfect or 100% done. Most of us are usually working on architectures that are: - Old & outdated - Hacked together - Mid-migration to new tools - Non-existent Or perhaps you're one of the lucky ones that recently started from scratch and things are running smoothly. Regardless, the best way to learn what's working (and not working) is from others. I believe this could be one of the best insights this community can collectively offer each other. So let's hear it. What does your data stack look like for the following components? 1. Database/Storage 2. Ingestion 3. Transformation 4. Version Control 5. Automation Feel free to add other items as well outside of these 5, but we can focus on these to keep it organized.

New comment 17d ago

Dorian Teffo

2 likes • Feb 20

Given all the responses, it seems there are two teams in data engineering: Team Python and Spark for transformation with a data lake/lake house And the team SQL and DBT with a data warehouse. However, it seems like DBT has gained the advantage in the fight. Therefore, is it still worth learning Spark?

Oleg Agapov

Jan 12

🧱 | Strategy & Design

Here is how I designed my new Data Warehouse

Hey everyone! Here is another update from my journey building analytics in company from scratch. This time let's talk about data warehouse design and topology. Specifically, I wanna share how I designed my databases, schemas and user roles. Buckle up! As you may know, I've chosen Snowflake as the data warehouse solution. And what I like about it is how flexible it can be in terms of architecting the desired solution. You will see why in a moment. First, I started with databases and schemas. In my setup there will be 4 databases: - RAW - for storing raw data from integrations and data lake. - ANALYTICS - a database with production models. - DBT_DEV - a database for dbt development. - SANDBOX - a playground database for any ad-hoc tables. ## RAW database Each schema within this database will follow a `{source}__{connector}` patters, so that it's always clear how the data was ingested (e.g. stripe__airflow, mongo__dagster, etc). ## ANALYTICS database For now only dbt is going to use this database, but in the future I see other transformation tools are going to be using this database. There will be several schemas: - `STAGING` -- for staging raw models - `INTEREMEDIATES` -- for int models (see this dbt guide for explanations) - separate schema for each business domain, e.g. `CORE`, `FINANCE`, `PRODUCT`, `MARKETING`, etc. ## DBT_DEV Every developer will have prefixed schemas within this database, and schemas should reflect the structure of a production database, e.g `OLEG_STAGING`, `OLEG_FINANCE`, etc. This way every analyst/developer is going to have an isolated space for their model development. ## SANDBOX Every developer are going to have their own schema for ad-hoc and temporary tables and view. To me, it should be sufficient structure to start working with data and deliver actual insights to the business.

New comment Feb 19

Dorian Teffo

0 likes • Feb 19

@Abdel Uq Excildraw

Dorian Teffo

Feb 5

🛠️ | Tooling

Which tools do you use for streaming data pipeline?

Hey, data community! As a Data Engineer, do you frequently work with streaming data? If yes, which tools do you use to ingest streaming data? And which ones do you use to process that data?

New comment Feb 9

Dorian Teffo

0 likes • Feb 6

Thanks for your response, I'm thinking of learning more about streaming data pipelines and wondering if they're commonly used in “real world”

Dorian Teffo

0 likes • Feb 9

@Emile Van Der Heyde Thanks for your response!!

1-4 of 4

Level 1

1point to level up

Dorian Teffo

@dorian-teffo-1774

Data engineer

Active 1d ago

Joined Jan 28, 2024

Contributions

Followers

Following