Hello! Welcome to The Modern Data Community. The goal of this community is to help Data Engineers on small (or solo) teams confidently build modern architectures by simplifying key concepts, clarifying common strategies & learning from others. Pumped to have you here! ==================== HOW IT WORKS ==================== By joining, you get instant access to tons of free content (see Classroom). Dive right in. But even more can be unlocked by contributing to the Community, which I encourage you to do. It works like this: Contribute (post/comment) >> Get points (likes) >> Reach new levels >> Unlock content ==================== 6 SIMPLE GUIDELINES ==================== ❌ Do not post error messages looking for others to debug your code. That's why Stack Overflow and other tool-specific Slack channels exist. ❌ Do not use this community for self-promotion (unless Admin approved). We all know it when we see it. ❌ Do not create low-quality posts with poor grammar/spelling or that provide little (or no) value. These will be deleted. You can do better! ✅ Ask questions, share your experiences & overall be a good person. This not only helps everyone get better, but can help you unlock bonus content faster. Win-Win. ✅ Speaking of wins, share yours! Whether it's finally solving a complex problem, hitting a team milestone or starting a new gig - post about it. You'll get the props you deserve and it just might inspire somebody else. ✅ Take the time to craft thoughtful posts & always proof-read before hitting submit. We're all about quality here. High quality posts --> more engagement (aka you'll climb the leaderboard & unlock content) --> ensures the community stays enjoyable for everyone. ==================== QUICK LINKS ==================== Here are a few links to help you get going: - Classroom - What's Your Data Stack? - Leaderboard - Work with me (Kahan Data Solutions)
It's one thing to read articles or watch videos about perfectly crafted data architectures and think you're way behind. But back here in reality, things get messy & nothing is ever perfect or 100% done. Most of us are usually working on architectures that are: - Old & outdated - Hacked together - Mid-migration to new tools - Non-existent Or perhaps you're one of the lucky ones that recently started from scratch and things are running smoothly. Regardless, the best way to learn what's working (and not working) is from others. I believe this could be one of the best insights this community can collectively offer each other. So let's hear it. What does your data stack look like for the following components? 1. Database/Storage 2. Ingestion 3. Transformation 4. Version Control 5. Automation Feel free to add other items as well outside of these 5, but we can focus on these to keep it organized.
A common myth: All independent consultants are hired directly by the client. In reality, there could be multiple layers in between. And the more layers there are, the more other firms need to make money too. That means the less you'll earn as an individual. Here are the 4 Tiers of getting paid as an independent consultant: ==== Tier 1: Direct with Client: The best case scenario (IMO) is you personally find a client and sign a contract directly with them. This is by far the most difficult to land but pays the most as there are no middle-men. Tier 2: Sub-Contract via Consulting Company: Many clients have established relationships with big consulting companies and view them as "consultants", not contractors. You can join an existing project and invoice the consulting company (not the client). Tier 3: Staffing Firm Hiring for Client: This is similar to Tier 2, but with one critical difference. Unlike Tier 2, in this scenario you're viewed more as a "contractor" rather than a consultant, which drives down the perceived value and rate you can charge. Tier 4: Staffing Firm Hiring For Consulting Company: If a consulting company needs help staffing their project, they will reach out for sourcing help. This means there are now 2 layers between you and the client - and each wants to make money. ==== Money isn't everything, and sometimes it's nice to have others find work for you. But be ready to adjust your rates accordingly.
Tabular is a well know approach to create a data model that can be then published and analysed in Analysis Services. Deploy a new feature on our tabular model is very straightforward, but what if we want to check the change we made before we deploy our model? ALM Toolkit is a great tool for that and more! In fact, it does not only enable us to deploy our model with a variety of options, but it also allows us to compare the datasets and see the difference between our published data model and the data model we're working on locally. We see the change in form of metadata, as in the image below 👇 In this case, the data type of the "formatString" column has been modified. Hope that was an interesting piece of information!
I've often seen it recommended to extract data out of a data source into a parquet file into a cheap file storage like Azure Blob or s3 buckets... what is the value of adding this step when all I'm doing is copying it on into a SQL Database?