Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Memberships

Modern Data Community

Public • 568 • Free

5 contributions to Modern Data Community
Full-Stack Dev -> Data Eng... dbt/Data Modeling is hard. Guidance?
Hey guys, I'm a Full Stack Dev switching to Data Eng. I am stuck on dbt + Data Modeling. Idk how to solve simple problems from a Data Modeling approach. Like... something as simple as "revenue" gives me trouble. If my initial DB has these tables, how might I design a "star schema" for the purposes of revenue? - Sales (refs a Product and a Customer) - Expenses - Products - Customers I'd want to support things like... - Daily, Weekly, Monthly Revenue, etc (this is called the "Date Dimension"?) - Revenue by Customer, or product (customer/product dimensions?) - Advanced stuff (like Profit, Rev growth, etc... but all capable of applying the above dimensions as well) How would such a Schema be designed? Is "star schema" the way to go? What would normally be consider the most "granular" in such a case? "Sales"? or a new "Revenue" model. What does a "Revenue" model even look like? Am I even allowed to do that? Does that even make sense? Like... although it'd be very nice to have such a model, idk how it'd get attached to any of the dimensions. Is a "star schema" even the way to go? And... aside from my specific example... are there any general rules, tips, best practices that come to mind while reading that anyone can share to help me learn? From Full-stack / backend dev, Schema design was always about coming up with those initial tables, and I'd solve problems using typical coding (functions, algorithms, etc). This way with Data Eng feels new and different. I imagine other's making the Full-stack -> Data transition might also run into similar issues. Any insights to helping ppl in my position stuck on these types of problems would def be appreciated. Thanks!
1
2
New comment Mar 29
0 likes • Mar 29
@Max Walzenbach hey, thanks for taking the time respond, this was appreciated!
Managing my data stack with a monorepo
It has been a while since my last update on my journey building a Modern Data Stack from scratch. Today, I want to share how I set up my data infrastructure and how I am managing it using a single Github repository. Let's get started! As explained in one of the previous articles, I have three basic components of my data infra: - Snowflake as data warehouse - Airbyte for data ingestion - dbt for data transformation ## Snowflake Setting up Snowflake is a straightforward process. Simply visit their website and start a commitment-free trial period. You can choose any edition of the tool (standard, enterprise, etc.) and use it for 30 days. During the setup, you will need to select your cloud provider and region. In my case, I chose AWS. Once your account is created, you will be given admin permission for the entire project. With this permission, you can set up databases and schemas, manage users and roles, configure warehouses, and make any other project-wide changes. I have already explained my data warehouse design in this article, so I proceeded to recreate it in a real project. From the beginning, I was eager to create configuration as code in order to meticulously track all the resources in my data warehouse. This may not seem very efficient initially, especially when dealing with only a few databases, schemas, and users. However, when considering the future, with potentially 20 roles, 40 users, and numerous schemas, I would begin to forget who has access to which resources and what provisions I have made. To tackle this challenge, there are several tools available. The first tool is the Terraform adapter for Snowflake. However, I didn't choose this option because I wanted something simpler than Terraform and more flexible. So, my second tool of choice was SnowDDL, which is a tool for declarative-style management of Snowflake resources. It took me about half a day to create all the configuration in YAML files and set up the infrastructure from scratch. In most cases, SnowDDL is an awesome tool that promotes best practices and automates many tasks. However, I decided not to use it and instead created my own tool with similar functionality tailored to my specific needs.
9
9
New comment Mar 30
0 likes • Mar 19
Hey, Thank you for sharing this, this was definitely nice to see so I can compare notes.
What's your Data Stack?
It's one thing to read articles or watch videos about perfectly crafted data architectures and think you're way behind. But back here in reality, things get messy & nothing is ever perfect or 100% done. Most of us are usually working on architectures that are: - Old & outdated - Hacked together - Mid-migration to new tools - Non-existent Or perhaps you're one of the lucky ones that recently started from scratch and things are running smoothly. Regardless, the best way to learn what's working (and not working) is from others. I believe this could be one of the best insights this community can collectively offer each other. So let's hear it. What does your data stack look like for the following components? 1. Database/Storage 2. Ingestion 3. Transformation 4. Version Control 5. Automation Feel free to add other items as well outside of these 5, but we can focus on these to keep it organized.
8
65
New comment 7d ago
2 likes • Mar 19
Hey guys! Happy to be here. I'm originally a Full-stack dev, learning + amazed by Modern Data. Currently: - Snowflake - Fivetran - dbt My non-data stack is: - NextJS - Rails - Small Python/Flask app(s) I have a python microservice to manage creating Snowflake DB's for users. (idk if i should eventually switch to Terraform) I manage Fivetran Groups/Destinations (with multi-destination, one Snowflake DB per customer)/Connectors in my Rails app. Data Sources get connected from NextJS w/ Fivetran's Oauth Connect Card. Past Data Stack: - Rails manages oauth w/ data source (nightmare, Fivetran does this better) - Rails asks for data directly from Data Source (super slow, same calls over and over) - A Python app would ask rails for the data, and then do a bunch of calculations w/ pandas + numpy, then return it to Rails - NextJS app gets "Calculations" from rails and shows them. My Journey: - I had no idea the modern data stack existed. - I thought I had to build everything myself. - Didn't even know what Fivetran did, just heard it thrown around a ton. - Transitioned to Snowflake w/o Fivetran. - Felt the pain of managing CDC. - Discovered Fivetran after 2-3 weeks, found out i was their target customer. - Tried doing playing with my snowflake data w/ a python service that makes SQL queries. - Had a huge headache managing SQL queries and trying to figure out how to organize them. - Soon found out dbt is a well thought out established way of doing exactly what I was trying to do. - Tried using dbt + Data Modeling, but stuck and confused. - Looked for help on YouTube + Udemy, found my way here.
How do you handle CDC ?
Hey, the data community! I'm currently working on a project that involves using Change Data Capture (CDC) for incremental loading from a MySQL database. I've come across Airbyte, which seems to offer a feature for this purpose. Does anybody have experience with CDC using Airbyte? Alternatively, are there other solutions you know of to deal with this?
4
2
New comment Mar 19
1 like • Mar 19
I also used Fivetran for this. At first, I tried doing the CDC thing on my own with a python microservice. It turned out to be quite the undertaking. I wanted to move data from Quickbooks -> Snowflake. After about 2-3 weeks, i started googling/youtubing some of my pain points, and "Fivetran" came up. When I watched their videos on YouTube (esp one with the CEO of Fivetran and the CEO of Snowflake being interviewed together) I knew this was prob the way I should go. I soon watched a 30-minute demo of Fivetran on YouTube, and some other videos of theirs, and it was as if I was exactly the person who's life they were trying to make better. I guess i mention this bc... although I'm a newb in the "Modern Data Stack", embracing the "Modern Data Stack" itself and acknowledging it as a path forward was in itself a challenge bc i simply didn't know any better at the time. So... I mention as someone who's made this kind of switch and found happiness, hopefully that helps (regardless of which tech you use, idk which is best tbh, but modern data stack def helps me).
[Start Here] Welcome to The Modern Data Community!
Hello! Welcome to The Modern Data Community. The goal of this community is to help Data Engineers on small (or solo) teams confidently build modern architectures by simplifying key concepts, clarifying common strategies & learning from others. Pumped to have you here! ==================== HOW IT WORKS ==================== By joining, you get instant access to tons of free content (see Classroom). Dive right in. But even more can be unlocked by contributing to the Community, which I encourage you to do. It works like this: Contribute (post/comment) >> Get points (likes) >> Reach new levels >> Unlock content ==================== 6 SIMPLE GUIDELINES ==================== ❌ Do not post error messages looking for others to debug your code. That's why Stack Overflow and other tool-specific Slack channels exist. ❌ Do not use this community for self-promotion (unless Admin approved). We all know it when we see it. ❌ Do not create low-quality posts with poor grammar/spelling or that provide little (or no) value. These will be deleted. You can do better! ✅ Ask questions, share your experiences & overall be a good person. This not only helps everyone get better, but can help you unlock bonus content faster. Win-Win. ✅ Speaking of wins, share yours! Whether it's finally solving a complex problem, hitting a team milestone or starting a new gig - post about it. You'll get the props you deserve and it just might inspire somebody else. ✅ Take the time to craft thoughtful posts & always proof-read before hitting submit. We're all about quality here. High quality posts --> more engagement (aka you'll climb the leaderboard & unlock content) --> ensures the community stays enjoyable for everyone. ==================== QUICK LINKS ==================== Here are a few links to help you get going: - Classroom - What's Your Data Stack? - Leaderboard - Work with me (Kahan Data Solutions)
31
123
New comment 10d ago
3 likes • Mar 19
Hey, I'm Jon, I'm a CTO / Co-founder at a FinTech startup (pre-seed). Data and Data Engineering are new to me. Mainly I'm a Full Stack Engineer with some cloud knowledge. But... now I'm basically building a data pipeline to help me do BI for ppl.
1-5 of 5
Jonathan Philippou
2
13points to level up
@jonathan-philippou-3699
CTO / Co-founder @ FinTech Startup (pre-seed)

Active 42d ago
Joined Mar 19, 2024
powered by