Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Memberships

Modern Data Community

Public • 568 • Free

18 contributions to Modern Data Community
ETL Recommendations
Hey all My company currently has a bunch of lambda functions in aws that extract data from APIs into S3 and then into snowflake. The Lambda function process is working but has limitations. a) Its has a complex set up and to make changes it takes lots of time, b) Monitoring isn't very visible. c) CDC is a challange to manage Since i probably wont be able to get the company to pay for a new tool to do the ETL i need to think of some free tools I can recomend that makes the pipeline robust and easy to monitor add sources and make changes quickly.. I am looking at Airbyte - any advice on this? What alternatives other alternatives are there perhaps?
2
4
New comment 4d ago
Choosing Between BigQuery, Redshift & Snowflake
Hello everyone, I'm currently in the process of evaluating data warehouse solutions to centralize data from 2 transactional Mysql databases, Events data from Segment, Salesforce & Intercom. Our goal is to load this data into a warehouse, perform some dbt transformation & then connect it to a BI Tool. We're considering BigQuery, Redshift, and Snowflake as potential solutions but are having a tough time deciding which one would be the best fit for our needs. The main key considerations we're looking at are Cost, Ease of Use, Performance and Speed Any recommendation you guys have that might be useful!
1
12
New comment Mar 17
2 likes • Mar 1
There is no one answer to this question, but i agree with the above people , Snowflake really is a joy to work with :-) . So if the data volumes are not huge , and your a small team , its certainly the way to go - as a way to balance ease of use and low maintenance time , with cost.
What is the value of extracting data into a cheap file store?
I've often seen it recommended to extract data out of a data source into a parquet file into a cheap file storage like Azure Blob or s3 buckets... what is the value of adding this step when all I'm doing is copying it on into a SQL Database?
2
6
New comment Feb 19
2 likes • Feb 12
Hey Adam , This is a great question that we often take for granted. Here are two of the many reasons. a) Have a place to store all the data generated , in many different file types, and for current and future usage. We may not need all that data today in a warehouse , or it can take time to get it into there , cause it may require development work to create the new structure in the data warehouse db . b) Do not extract and touch source more than you need to . If something fails in the warehouse, and you need to reload , then you have this s3 files to go back to and don't need to worry source systems. You can also read more about the "Lakehouse architecture" online which is a combination that is really popular now.
Data Vault
Data Vault Modeling I just started to learn about this topic. Hub, Link, and Satellite. I am trying to understand the correlation with Kimball, or if it is an extension of it. How to effectively applied in real-world challenges
1
6
New comment Feb 14
1 like • Feb 10
hey , don't know if you saw this in the groups resources - it compares different modeling architectures very nicely. thank you , again @Michael Kahan https://www.skool.com/modern-data-community/classroom/67558700?md=95a12bf52c9543cb81a9445fa64d5281 Another article i like is this one specifically on Data Vault: https://www.phdata.io/blog/building-modern-data-platform-with-data-vault/ Personally i am a fan of One Big Table :-)
Here is how I designed my new Data Warehouse
Hey everyone! Here is another update from my journey building analytics in company from scratch. This time let's talk about data warehouse design and topology. Specifically, I wanna share how I designed my databases, schemas and user roles. Buckle up! As you may know, I've chosen Snowflake as the data warehouse solution. And what I like about it is how flexible it can be in terms of architecting the desired solution. You will see why in a moment. First, I started with databases and schemas. In my setup there will be 4 databases: - RAW - for storing raw data from integrations and data lake. - ANALYTICS - a database with production models. - DBT_DEV - a database for dbt development. - SANDBOX - a playground database for any ad-hoc tables. ## RAW database Each schema within this database will follow a `{source}__{connector}` patters, so that it's always clear how the data was ingested (e.g. stripe__airflow, mongo__dagster, etc). ## ANALYTICS database For now only dbt is going to use this database, but in the future I see other transformation tools are going to be using this database. There will be several schemas: - `STAGING` -- for staging raw models - `INTEREMEDIATES` -- for int models (see this dbt guide for explanations) - separate schema for each business domain, e.g. `CORE`, `FINANCE`, `PRODUCT`, `MARKETING`, etc. ## DBT_DEV Every developer will have prefixed schemas within this database, and schemas should reflect the structure of a production database, e.g `OLEG_STAGING`, `OLEG_FINANCE`, etc. This way every analyst/developer is going to have an isolated space for their model development. ## SANDBOX Every developer are going to have their own schema for ad-hoc and temporary tables and view. To me, it should be sufficient structure to start working with data and deliver actual insights to the business.
12
13
New comment Feb 19
0 likes • Feb 9
@Tomas Truchly keen to know if you had all these layers when working with dbt and snowflake. I understand the need for a Dev environments when you tring to build onto existing structure or new things. But do we have to have it as complicated as above. Maybe i can just create a Dev database that i hide from end users and build in there , and once happy just switch my database to the production one and delete the work i did in dev. Thoughts?
1-10 of 18
Emile Van Der Heyde
3
40points to level up
@emile-van-der-heyde-8746
Analytics Engineer in Payments. South Africa based

Active 1d ago
Joined Jan 27, 2024
powered by