Abdel Uq

Modern Data Community

Activity

Mon

Wed

Fri

Sun

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

What is this?

Less

Created by Abdel

Cohort 24 📈

Private • 1 • Free

Memberships

Modern Data Community

Public • 573 • Free

3 contributions to Modern Data Community

Abdel Uq

Feb 27

🛠️ | Tooling

Choosing Between BigQuery, Redshift & Snowflake

Hello everyone, I'm currently in the process of evaluating data warehouse solutions to centralize data from 2 transactional Mysql databases, Events data from Segment, Salesforce & Intercom. Our goal is to load this data into a warehouse, perform some dbt transformation & then connect it to a BI Tool. We're considering BigQuery, Redshift, and Snowflake as potential solutions but are having a tough time deciding which one would be the best fit for our needs. The main key considerations we're looking at are Cost, Ease of Use, Performance and Speed Any recommendation you guys have that might be useful!

New comment Mar 17

Abdel Uq

0 likes • Mar 17

Thank you for your Response guys

Abdel Uq

0 likes • Mar 17

Thank you Dragos

Oscar Jimenez

Mar 14

🧱 | Strategy & Design

RDS to S3 migration

Hello everyone this is my first post, recently join a company in my first official data eng role, I am looking to have some help here, I am being tasked of transfering a 150M records from a RDS MySQL table to S3 (.parquet files). The table is so hard to query that data is dropped daily, storing only 90 days. And this is one of the problems of the migration, that querying from it is impossible my first approach was with a simple lambda, mysql connector and python script and chunk it, but would take me like 2 days if i do that. Also idea is to have this data somewhere else before thinking in a Lakehouse solution. My questions are: - What services do you recommend to make this migration as fast and smooth as possible, just once. First though is glue (used before but different purpose), or DMS service (have not never used it). - What ETL you propose to make this process daily (1.5M records) comes to glue again if i am succesfull with the first bullet point - Lastly, this data is desired to be used for analytics, initially will be in S3 to make queries using athena while the team gains idea about the kpis they want to track, in the future the idea is to have it somewhere else that make it fast to query and build models with it. The whole company env. is in AWS so my first though is RedShift but I really like the efficiency and how GoogleBigQuery handles this amount of data Thank you so much for reading!

New comment 21d ago

Abdel Uq

0 likes • Mar 17

I suggest trying something called Amazon Zero ETL. I think you can transfer data in less than 10 clicks. Just select your MySQL source, which should be in an RDS instance, then choose Redshift as the destination. The ingestion will take a bit of time, but I believe it will get the job done eventually.

Oleg Agapov

Jan 12

🧱 | Strategy & Design

Here is how I designed my new Data Warehouse

Hey everyone! Here is another update from my journey building analytics in company from scratch. This time let's talk about data warehouse design and topology. Specifically, I wanna share how I designed my databases, schemas and user roles. Buckle up! As you may know, I've chosen Snowflake as the data warehouse solution. And what I like about it is how flexible it can be in terms of architecting the desired solution. You will see why in a moment. First, I started with databases and schemas. In my setup there will be 4 databases: - RAW - for storing raw data from integrations and data lake. - ANALYTICS - a database with production models. - DBT_DEV - a database for dbt development. - SANDBOX - a playground database for any ad-hoc tables. ## RAW database Each schema within this database will follow a `{source}__{connector}` patters, so that it's always clear how the data was ingested (e.g. stripe__airflow, mongo__dagster, etc). ## ANALYTICS database For now only dbt is going to use this database, but in the future I see other transformation tools are going to be using this database. There will be several schemas: - `STAGING` -- for staging raw models - `INTEREMEDIATES` -- for int models (see this dbt guide for explanations) - separate schema for each business domain, e.g. `CORE`, `FINANCE`, `PRODUCT`, `MARKETING`, etc. ## DBT_DEV Every developer will have prefixed schemas within this database, and schemas should reflect the structure of a production database, e.g `OLEG_STAGING`, `OLEG_FINANCE`, etc. This way every analyst/developer is going to have an isolated space for their model development. ## SANDBOX Every developer are going to have their own schema for ad-hoc and temporary tables and view. To me, it should be sufficient structure to start working with data and deliver actual insights to the business.

New comment Feb 19

Abdel Uq

0 likes • Feb 19

Hi Oleg - Thank you for sharing, which software you use to map like these flowcharts?

Abdel Uq

1 like • Feb 19

@Dorian Teffo Thank you so much

1-3 of 3

Level 1

3points to level up

Abdel Uq

@abdel-uq-8870

Active 16d ago

Joined Jan 31, 2024

Contributions

Followers

Following