Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Memberships

Modern Data Community

Public โ€ข 573 โ€ข Free

5 contributions to Modern Data Community
Tabular model and ALM Toolkit
Tabular is a well know approach to create a data model that can be then published and analysed in Analysis Services. Deploy a new feature on our tabular model is very straightforward, but what if we want to check the change we made before we deploy our model? ALM Toolkit is a great tool for that and more! In fact, it does not only enable us to deploy our model with a variety of options, but it also allows us to compare the datasets and see the difference between our published data model and the data model we're working on locally. We see the change in form of metadata, as in the image below ๐Ÿ‘‡ In this case, the data type of the "formatString" column has been modified. Hope that was an interesting piece of information!
3
6
New comment Mar 3
2 likes โ€ข Feb 26
I've had a great experience with tabular model, especially how easily the SSAS cube refreshes without the need of complex ETL pipelines
2 likes โ€ข Mar 2
In my case we had one single cube, the business had few sources (like 4 different databases), and the solution was not very complex, so it worked out well.
Feedback and advice on ETL pipeline for AWS
Hi, I'm working on my first ever ETL pipeline project from zero, and I designed this architecture for aws. It is based on research I've done and chatting with more experienced colleagues. I'm sure it can be improved but so far I feel comfortable with it. Brief explanation of the data lake: - the "raw" bucket is intended as a landing zone, with data in its original state (for auditing purposes) - the "cleansed" bucket contains data after fixing inconsistencies, invalid values, deleting unnecesary columns, deduplicating, etc. - the "curated" bucket is intended to have the data ready to be ingested to the data warehouse, matching fact and dimension tables schemas. First of all, I'd like to know what you think in general. Then I have a couple of questions: 1. should I store the processed data in the cleansed bucket in csv or parquet would be a better option? 2. should I keep curated as an s3 bucket in parquet it would be better as a database schema? 3. Last but not least, I'm struggling with the pipeline strategy, meaning how would I perform incremental loads? (I plan to batch update daily). What techniques can I use? Any design considerations? AWS Glue jobs have a bookmarks feature to avoid reprocesing data, but if the schema is updated in the catalog, the job will fail, and process all data in the source. Thanks in advance. I couldn't think of a better place to share my concerns than this community.
3
9
New comment Jan 30
0 likes โ€ข Jan 29
Thank you all for your feedback. What worries me the most is how the incremental load can be implemented. I know there are a bunch of techniques, but I can't find a good detailed explanation, like step by step (for example, exactly where should I add timestamps, should I use hashs to identify changed data, even if it's platform agnostic).
6 Things I Wish I Knew 8 Years Ago About Data Engineering
If I could go back in time, these are the 6 things I wish I knew about Data Engineering. (It would have saved me a lot of wasted time and effort.) - 1. There will always new fires to put out Every new bug, urgent request, etc. feels like the sky is falling. And you'll stress out trying to complete it as soon as possible. ...Only to get hit with a new one the next week. Keep it in perspective. ============= 2. There's no universal "best" approach So much time gets spent debating decisions & strategies. It's vital to have an opinion and think through options. But most approaches typically work out as long as you're on the same page. I've seen many bridges burned due to an inability to compromise. ============= 3. Beware of personal scope creep It feels noble to fix every bug or inefficient query you come across. But this distracts from assigned work & sends you down an unnecessary rabbit hole. Instead, log these issues separately and complete the task at hand first. ============= 4. Show, vs tell, new solutions Don't expect a change to happen by just casually explaining it. People are busy and have other things on their mind. Take time to put together a sample of your solution that you can show. Once people see it, then they pay attention. ============= 5. Curiosity & continuous learning are critical This is a technology career and it constantly evolves. If you become stagnant, so will your career. ============= 6. Every company's data is a mess, you're not alone The grass isn't always greener on the other side. Also, most problems are human problems vs tech/data problems. ============= In short, Data Engineering is a journey, not a destination. And we are all on this journey together. What are some lessons you've learned in your career so far?
18
8
New comment Jan 31
1 like โ€ข Jan 24
Loved this
1 like โ€ข Jan 24
@Rich Muckey I feel you. Same for me
A sneak peak on something new I'm working on - Thoughts?
I tend to waste a lot of time over analyzing things - especially when it comes to new content. So rather than continue to do that, this time I figured I'd share a raw behind-the-scenes look at something new I'm working on directly with you. Your feedback can make sure it's on the right track and will help you overcome real challenges you're facing. (or let me know if it's off the mark) Rather than bore you with text, I've recorded a Loom video breaking it down. Heads up - this was a completely unplanned video and I ramble a bit. But hopefully the main points still come across. Any feedback at all on this would be incredibly valuable. Feel free to leave a comment here or DM me directly if you'd prefer to remain private. Thanks as always!
8
36
New comment Mar 30
2 likes โ€ข Jan 18
I think it's a great idea, just keep it simple as you describe in the video, because is the best way to start. Then you can add scenarios often seen in real life and how to deal with them.
1 like โ€ข Jan 24
Michael, I noticed that one of the building blocks of the project is *Planning strategy*, which has been a recurring topic on your YouTube channel. My favorite tip is โ€œmental clarity,โ€ but I actually lack it. Have you posted anything about these topics?
What's your Data Stack?
It's one thing to read articles or watch videos about perfectly crafted data architectures and think you're way behind. But back here in reality, things get messy & nothing is ever perfect or 100% done. Most of us are usually working on architectures that are: - Old & outdated - Hacked together - Mid-migration to new tools - Non-existent Or perhaps you're one of the lucky ones that recently started from scratch and things are running smoothly. Regardless, the best way to learn what's working (and not working) is from others. I believe this could be one of the best insights this community can collectively offer each other. So let's hear it. What does your data stack look like for the following components? 1. Database/Storage 2. Ingestion 3. Transformation 4. Version Control 5. Automation Feel free to add other items as well outside of these 5, but we can focus on these to keep it organized.
8
65
New comment 18d ago
8 likes โ€ข Dec '23
First of all I want to congratulate you, @Michael Kahan, for the initiative, because I think it's a great idea and all of us would benefit from it. That said, here's my stack: 1. MySQL, Amazon Redshift, S3 2. AWS Glue, AWS Lambda 3. AWS Glue, AWS Glue DataBrew 4. none ๐Ÿ™ƒ 5. none (yet) I've started data engineering 6 months ago, and I'm alone trying to build a data warehouse and data pipeline in AWS.
1-5 of 5
Nicolas Lope de Barrios
3
45points to level up
@nicolas-lope-de-barrios-3958
Data Analyst / Data Engineer working in the Public Sector and as freelancer in a small financial company. Buenos Aires, AR.

Active 17d ago
Joined Dec 27, 2023
powered by