Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Created by Michael

Modern Data Community

Public • 568 • Free

A community of data professionals building architectures with modern tools & strategies.

AE Academy

Private • 6 • Free

Helping motivated data professionals become high-performing Analytics Engineers, faster.

Memberships

The Skool Games

Private • 12.6k • Free

Skool Community

Public • 79.7k • Paid

Contentpreneurship.com (Free)

Public • 4.9k • Free

63 contributions to Modern Data Community
New YouTube Video - Data Warehouse Security w/ 4 Simple Roles
Although security isn't the most glamorous part of being a data engineer, it's arguably one of the most important. While each team will have their own unique strategy & naming conventions, this week I want to share what's worked well for me. It's a simple approach based around 4 high level roles. It covers each key layer a warehouse pipeline from source data ingestion to end-user reporting. Enjoy. PS - Sorry about the weird audio on this one. Didn't notice until after it was recorded but still wanted to get it out to you.
3
0
New YouTube Video - The 3 Layered Data Model
Decided to turn a recent post into a YouTube video. --- A data warehouse acts as the main hub for most data teams, yet it often becomes a mess. While there are many different strategies to handle this, in this video I want to share the approach I follow. It's based around a simple 3-layered design to take raw source data into meaningful data marts ready for analytics. Enjoy!
4
3
New comment 3d ago
Data Warehousing w/ dbt - A 3 Layered Approach
This is a friendly reminder that the "data grass" isn't always greener on the other side. Everyone is doing their best and every business has their unique challenges. But one thing I've recently noticed is that many teams struggle in the same area - the data warehouse. While there's no one-size-fits all approach, I found myself repeating my recommendation over the past few weeks so figured I'd share here. For context, what I'm going to share is focused around dbt projects. The typical scenario is that a business starts a project on their own but quickly finds themselves with an unorganized and/or unscalable project. Which is how they end up talking to me. At a high level, here's the simple 3 layered approach I follow: > Layer 1: Staging > Layer 2: Warehouse > Layer 3: Marts Staging: - Create 1:1 with each source table (deploy as views to avoid duplicate storage) - Light transformations for modularity (ex. renaming columns, simple case-whens, conversions) - Break down into sub-folders by source system - Deploy to a Staging schema models/staging/[source-system] Warehouse: - Pull from Staging layer (simple transforms already handled) - Facts: Keys & Metrics (numeric values) - Dimensions: Primary Key & Context (descriptive, boolean, date values) - Deploy to a single Warehouse schema models/warehouse/facts models/warehouse/dimensions Marts: - Pull from Warehouse (facts & dims allow for simple joins) - Create wide tables w/ multiple use cases (vs 1:1 for each report) - Either deploy to a single Mart schema or break up by business unit/user grouping models/marts (or) models/marts/[business-unit] This doesn't cover other important topics like Environments, CI/CD & Documentation. But if you're also working on your own project or considering approaches, hopefully this will help! Other dbt users - how do you structure your project?
6
9
New comment 9d ago
0 likes • 11d
@Jay Archer In your scenario, are you trying to do joins within a single view or some other complex business logic? Or do you have many columns at the source? Asking b/c most modern cloud DBs are pretty good at handling large data sets even without clustering (which is of course still a good practice).
1 like • 10d
@Jay Archer Oh yeah, definitely agree that marts tables (for reporting layer) should be tables if possible. Views only 1:1 on top of source tables for initial formatting. Also... 15K concurrent users is intense! That's great experience though
ETL Recommendations
Hey all My company currently has a bunch of lambda functions in aws that extract data from APIs into S3 and then into snowflake. The Lambda function process is working but has limitations. a) Its has a complex set up and to make changes it takes lots of time, b) Monitoring isn't very visible. c) CDC is a challange to manage Since i probably wont be able to get the company to pay for a new tool to do the ETL i need to think of some free tools I can recomend that makes the pipeline robust and easy to monitor add sources and make changes quickly.. I am looking at Airbyte - any advice on this? What alternatives other alternatives are there perhaps?
2
4
New comment 3d ago
1 like • 14d
I've worked w/ Airbyte quite a bit on projects in the past and it has been pretty solid overall. You can also self-host it pretty easily to get a feel for it. Fivetran is also really nice and now has a fairly generous free tier to get acquainted with it. The third option I've personally used is Stitch, which is also pretty good but to me is less user user-friendly. But also more budget friendly compared to the other ones. Having said all that, an important thing to consider with all of this is *the cost of your time*. You may save on licensing fees by using a free or open source tool, but it will take up much more of your time to maintain & support it. And people don't work for free, so there is still a very real cost to that. It's also probably more than the cost of licenses. Neither approach (paid vs self-host) is right or wrong, but something I always like to point out.
Metric Layer in dbt Core.
Hey team! One of my companies goal the following quarter will be to define an aggregated metric layer for rapid insight of business important metrics. I've done something similar previous but not with dbt. I'd like to know if anyone has done this with dbt core and how would be the best way to approach it. Thanks!
1
2
New comment 7d ago
0 likes • 14d
Hey @Manuel Ponsa - Here's a link to a similar post about metrics & dbt Semantic Layer here which has a few replies. Hope that helps! https://www.skool.com/modern-data-community/questions-about-dynamic-measures-and-experience-with-dbt-semantic-layer?p=0dd61cd8
1-10 of 63
Michael Kahan
5
180points to level up
@michael-kahan-3539
Helping small data teams build simple, modern data architectures.

Active 18h ago
Joined Dec 12, 2023
USA
powered by