Emile Van Der Heyde

Modern Data Community

Activity

Mon

Wed

Fri

Sun

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

What is this?

Less

Memberships

Modern Data Community

Public • 568 • Free

18 contributions to Modern Data Community

Emile Van Der Heyde

23d ago

🧱 | Strategy & Design

ETL Recommendations

Hey all My company currently has a bunch of lambda functions in aws that extract data from APIs into S3 and then into snowflake. The Lambda function process is working but has limitations. a) Its has a complex set up and to make changes it takes lots of time, b) Monitoring isn't very visible. c) CDC is a challange to manage Since i probably wont be able to get the company to pay for a new tool to do the ETL i need to think of some free tools I can recomend that makes the pipeline robust and easy to monitor add sources and make changes quickly.. I am looking at Airbyte - any advice on this? What alternatives other alternatives are there perhaps?

New comment 4d ago

Abdel Uq

Feb 27

🛠️ | Tooling

Choosing Between BigQuery, Redshift & Snowflake

Hello everyone, I'm currently in the process of evaluating data warehouse solutions to centralize data from 2 transactional Mysql databases, Events data from Segment, Salesforce & Intercom. Our goal is to load this data into a warehouse, perform some dbt transformation & then connect it to a BI Tool. We're considering BigQuery, Redshift, and Snowflake as potential solutions but are having a tough time deciding which one would be the best fit for our needs. The main key considerations we're looking at are Cost, Ease of Use, Performance and Speed Any recommendation you guys have that might be useful!

New comment Mar 17

Emile Van Der Heyde

2 likes • Mar 1

There is no one answer to this question, but i agree with the above people , Snowflake really is a joy to work with :-) . So if the data volumes are not huge , and your a small team , its certainly the way to go - as a way to balance ease of use and low maintenance time , with cost.

Adam Smith

Feb 12

🧱 | Strategy & Design

What is the value of extracting data into a cheap file store?

I've often seen it recommended to extract data out of a data source into a parquet file into a cheap file storage like Azure Blob or s3 buckets... what is the value of adding this step when all I'm doing is copying it on into a SQL Database?

New comment Feb 19

Emile Van Der Heyde

2 likes • Feb 12

Hey Adam , This is a great question that we often take for granted. Here are two of the many reasons. a) Have a place to store all the data generated , in many different file types, and for current and future usage. We may not need all that data today in a warehouse , or it can take time to get it into there , cause it may require development work to create the new structure in the data warehouse db . b) Do not extract and touch source more than you need to . If something fails in the warehouse, and you need to reload , then you have this s3 files to go back to and don't need to worry source systems. You can also read more about the "Lakehouse architecture" online which is a combination that is really popular now.

Bosete Kumar

Feb 10

🧱 | Strategy & Design

Data Vault

Data Vault Modeling I just started to learn about this topic. Hub, Link, and Satellite. I am trying to understand the correlation with Kimball, or if it is an extension of it. How to effectively applied in real-world challenges

New comment Feb 14

Emile Van Der Heyde

1 like • Feb 10

hey , don't know if you saw this in the groups resources - it compares different modeling architectures very nicely. thank you , again @Michael Kahan https://www.skool.com/modern-data-community/classroom/67558700?md=95a12bf52c9543cb81a9445fa64d5281 Another article i like is this one specifically on Data Vault: https://www.phdata.io/blog/building-modern-data-platform-with-data-vault/ Personally i am a fan of One Big Table :-)

Oleg Agapov

Jan 12

🧱 | Strategy & Design

Here is how I designed my new Data Warehouse

Hey everyone! Here is another update from my journey building analytics in company from scratch. This time let's talk about data warehouse design and topology. Specifically, I wanna share how I designed my databases, schemas and user roles. Buckle up! As you may know, I've chosen Snowflake as the data warehouse solution. And what I like about it is how flexible it can be in terms of architecting the desired solution. You will see why in a moment. First, I started with databases and schemas. In my setup there will be 4 databases: - RAW - for storing raw data from integrations and data lake. - ANALYTICS - a database with production models. - DBT_DEV - a database for dbt development. - SANDBOX - a playground database for any ad-hoc tables. ## RAW database Each schema within this database will follow a `{source}__{connector}` patters, so that it's always clear how the data was ingested (e.g. stripe__airflow, mongo__dagster, etc). ## ANALYTICS database For now only dbt is going to use this database, but in the future I see other transformation tools are going to be using this database. There will be several schemas: - `STAGING` -- for staging raw models - `INTEREMEDIATES` -- for int models (see this dbt guide for explanations) - separate schema for each business domain, e.g. `CORE`, `FINANCE`, `PRODUCT`, `MARKETING`, etc. ## DBT_DEV Every developer will have prefixed schemas within this database, and schemas should reflect the structure of a production database, e.g `OLEG_STAGING`, `OLEG_FINANCE`, etc. This way every analyst/developer is going to have an isolated space for their model development. ## SANDBOX Every developer are going to have their own schema for ad-hoc and temporary tables and view. To me, it should be sufficient structure to start working with data and deliver actual insights to the business.

New comment Feb 19

Emile Van Der Heyde

0 likes • Feb 9

@Tomas Truchly keen to know if you had all these layers when working with dbt and snowflake. I understand the need for a Dev environments when you tring to build onto existing structure or new things. But do we have to have it as complicated as above. Maybe i can just create a Dev database that i hide from end users and build in there , and once happy just switch my database to the production one and delete the work i did in dev. Thoughts?

1-10 of 18

Level 3

40points to level up

Emile Van Der Heyde

@emile-van-der-heyde-8746

Analytics Engineer in Payments. South Africa based

Active 1d ago

Joined Jan 27, 2024

Contributions

Followers

Following