Activity
Mon
Wed
Fri
Sun
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

Data Innovators Exchange

Public • 170 • Free

20 contributions to Data Innovators Exchange
To Hash or not to Hash 🧐
Hashing is a crucial part of Data Vault implementations. They help quickly identifying deltas, by not having to compare every single attribute of a satellite, but instead comparing the hashed value over all of these attributes. This helps to reduce the complexity of queries being written, since significantly fewer columns need to be fully specified. But now imagine a fully automated Raw Data Vault implementation, would you still generate Hashkeys and Hashdiffs? Since you don't write the loading scripts of satellites by yourself, what benefit do hash values bring to the Data Vault implementation? Wouldn't it be nicer to directly have business keys everywhere? You could argue that delta detection might be slower, when all columns need to be compared, but does anyone have experience if this is really the case? On modern databases, I would imagine this delta detection to not have an actual impact on overall performance. What's your opinion on skipping hashes? Let me know!
6
4
New comment 6d ago
To Hash or not to Hash 🧐
2 likes • 6d
I had a discussion about hashing with Petr Beles at the last Data Dreamland. He sad, for example, that DV Builder is prepared to calculate hash diff but there is no database, which they support for now, which would be quicker in hash computing than in column by column comparison. So, DV Builder currently doesn't calculate and store hash diff. But they also make a performance test with every new version of target database and they are prepared to switch the hash diff for selected db and versions. Interesting.
What Was Your First Experience Working with Data? Some Fantasy Football experts here?
Hey everyone, I'm curious to know—what was your entry point into the world of data? For me, it all began with Fantasy Football. I wanted to create my own stat analytics, so I started querying player databases using an API. There’s so much to analyze in American Football, and with our fantasy draft kicking off this Sunday, I’m reminded of those early days. So, what was your first data project? How did you get started? Looking forward to hearing your stories!
8
9
New comment 6d ago
3 likes • 6d
I must say, I don't remember, what data it was. But I know that it was in summer 1990, my first part-time job - development of an educational program in dBase III to demonstrate basic functions of this database. It was on PC - XT with 10MB HDD and 640 kB of memory. No. Now I remember. The most first data I worked with, were final grades for the whole secondary school, I've studied. I was a second year student, fond of computers and the school received a brand new computing laboratory of PCs. Friend of my class teacher prepared a database and a program in dBase III for final grades recording, mandatory reports and school reports printing. But there was no one who was able to use it, much less maintain it. So, in June 1990, last two weeks of school before holidays, I was at school but not in class. I acted as a team leader of group of ten schoolmates. We sat in the computing laboratory and copied all the grades from paper into database. Then we printed all the school reports. As I remember, it was about 1.200 students. It was crazy time. The first years after "velvet revolution" in former Czechoslovakia were very special.
Snowflake vs. Databricks
Just finished watching a breakdown of Snowflake and Databricks. While the comparison itself isn't new, the focus on their origins and how that shaped their current offerings was insightful. Key takeaways: - Foundational Philosophies: Snowflake's roots in traditional data warehousing vs. Databricks' academic, notebook-centric beginnings are evident in their core strengths even today. - Architectural Choices: Snowflake's virtual warehouses & micro-partitions vs. Databricks' Spark clusters & Delta Lake highlight differing approaches to storage, compute & scalability. - The paths of the two platforms, Snowpark and the data lakehouse concept, reflect their initial goals. Share your thoughts! Which platform's philosophy resonates more with your data workflows?
Poll
4 members have voted
5
4
New comment 6d ago
Snowflake vs. Databricks
4 likes • 6d
I have, maybe, too simple point of view. If your data are more than 50% relational - go to Snowflake. Else - go to DataBricks. And don't tell me that you have 100% of data in semi-structured file extracts, although all your primary data sources are relational databases.
MS Fabric
Where are you guys with Microsoft Fabric?
Poll
11 members have voted
7
5
New comment 10d ago
0 likes • 10d
And what about DataBricks @Jonas De Keuster ?
IRIS Webinar recording
Hi all. I did not catch the IRIS webinar. The first term was a while past midnight and the second during the project meeting. Is the recording from one of the webinars anywhere?
4
2
New comment 17d ago
4 likes • 17d
@Tim Kirschke I was signed up for the first term, yet I don't see the recording. I looked in "Class" section of this site. But, Julien sent me already a direct link.
1-10 of 20
Richard Sklenařík
4
83points to level up
@richard-sklenarik-8201
DWH architect who built his first DWH in 1998 and went through different methodologies to finally find that there was one that finally work all along.

Active 13h ago
Joined Jul 9, 2024
Prague
powered by