Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Created by Abdel

C2
Cohort 24 📈

Private • 1 • Free

Memberships

Modern Data Community

Public • 573 • Free

3 contributions to Modern Data Community
Choosing Between BigQuery, Redshift & Snowflake
Hello everyone, I'm currently in the process of evaluating data warehouse solutions to centralize data from 2 transactional Mysql databases, Events data from Segment, Salesforce & Intercom. Our goal is to load this data into a warehouse, perform some dbt transformation & then connect it to a BI Tool. We're considering BigQuery, Redshift, and Snowflake as potential solutions but are having a tough time deciding which one would be the best fit for our needs. The main key considerations we're looking at are Cost, Ease of Use, Performance and Speed Any recommendation you guys have that might be useful!
1
12
New comment Mar 17
0 likes • Mar 17
Thank you for your Response guys
0 likes • Mar 17
Thank you Dragos
RDS to S3 migration
Hello everyone this is my first post, recently join a company in my first official data eng role, I am looking to have some help here, I am being tasked of transfering a 150M records from a RDS MySQL table to S3 (.parquet files). The table is so hard to query that data is dropped daily, storing only 90 days. And this is one of the problems of the migration, that querying from it is impossible my first approach was with a simple lambda, mysql connector and python script and chunk it, but would take me like 2 days if i do that. Also idea is to have this data somewhere else before thinking in a Lakehouse solution. My questions are: - What services do you recommend to make this migration as fast and smooth as possible, just once. First though is glue (used before but different purpose), or DMS service (have not never used it). - What ETL you propose to make this process daily (1.5M records) comes to glue again if i am succesfull with the first bullet point - Lastly, this data is desired to be used for analytics, initially will be in S3 to make queries using athena while the team gains idea about the kpis they want to track, in the future the idea is to have it somewhere else that make it fast to query and build models with it. The whole company env. is in AWS so my first though is RedShift but I really like the efficiency and how GoogleBigQuery handles this amount of data Thank you so much for reading!
3
8
New comment 21d ago
0 likes • Mar 17
I suggest trying something called Amazon Zero ETL. I think you can transfer data in less than 10 clicks. Just select your MySQL source, which should be in an RDS instance, then choose Redshift as the destination. The ingestion will take a bit of time, but I believe it will get the job done eventually.
Here is how I designed my new Data Warehouse
Hey everyone! Here is another update from my journey building analytics in company from scratch. This time let's talk about data warehouse design and topology. Specifically, I wanna share how I designed my databases, schemas and user roles. Buckle up! As you may know, I've chosen Snowflake as the data warehouse solution. And what I like about it is how flexible it can be in terms of architecting the desired solution. You will see why in a moment. First, I started with databases and schemas. In my setup there will be 4 databases: - RAW - for storing raw data from integrations and data lake. - ANALYTICS - a database with production models. - DBT_DEV - a database for dbt development. - SANDBOX - a playground database for any ad-hoc tables. ## RAW database Each schema within this database will follow a `{source}__{connector}` patters, so that it's always clear how the data was ingested (e.g. stripe__airflow, mongo__dagster, etc). ## ANALYTICS database For now only dbt is going to use this database, but in the future I see other transformation tools are going to be using this database. There will be several schemas: - `STAGING` -- for staging raw models - `INTEREMEDIATES` -- for int models (see this dbt guide for explanations) - separate schema for each business domain, e.g. `CORE`, `FINANCE`, `PRODUCT`, `MARKETING`, etc. ## DBT_DEV Every developer will have prefixed schemas within this database, and schemas should reflect the structure of a production database, e.g `OLEG_STAGING`, `OLEG_FINANCE`, etc. This way every analyst/developer is going to have an isolated space for their model development. ## SANDBOX Every developer are going to have their own schema for ad-hoc and temporary tables and view. To me, it should be sufficient structure to start working with data and deliver actual insights to the business.
12
13
New comment Feb 19
0 likes • Feb 19
Hi Oleg - Thank you for sharing, which software you use to map like these flowcharts?
1 like • Feb 19
@Dorian Teffo Thank you so much
1-3 of 3
Abdel Uq
1
3points to level up
@abdel-uq-8870
->

Active 16d ago
Joined Jan 31, 2024
powered by