Tomas Truchly

Modern Data Community

Activity

Mon

Wed

Fri

Sun

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

What is this?

Less

Memberships

Modern Data Community

Public • 568 • Free

AE Academy

Private • 6 • Free

32 contributions to Modern Data Community

Jay Archer

8d ago

🛠️ | Tooling

Ratio of data engineers to analytics engineers

What is the ratio of people in your organization who are working on your data integration layer vs your data modeling/transformation layer vs BI/Visualization layer? Regardless of job title, be it data engineer, analytics engineer, data analyst, reporting analyst, whatever, how many people in your stack are working just on data integration vs modeling? I ask because I've seen this fluctuate in the orgs I'm in based on tooling choice and the health of the system. In the best, most agile org, the ratio of data engineers to analytics engineers was 1:3 to 1:4. In the least agile, most frustrating (for the business stakeholders waiting on data and reports), the ratio was 3:1 to 4:1, totally reversed. Thus far, this has been a tooling issue. The choice of data integration tool caused an enormous amount of manual work to acquire data and put together all the orchestration pieces etc before it was available for modeling or transformation. Help me fight my bias and tell me what you've seen in your previous roles and in your current role

New comment 6d ago

Tomas Truchly

1 like • 6d

I don't think is that easy to mark data teams more occupied with data integrations & transformations than visualizations to be least agile or frustrating. Sure from business POV the dasboards / reports are the final product they can use / sale / understand so spending effort on DE work may seem unefficient. But the most work represents just that if you are building new "products". Not saying not all data integrations are used for visualizations only. Sometimes there is need to sync data between systems / DB. That may be a lot of "purely" DE work.

Michael Kahan

14d ago

🧱 | Strategy & Design

Data Warehousing w/ dbt - A 3 Layered Approach

This is a friendly reminder that the "data grass" isn't always greener on the other side. Everyone is doing their best and every business has their unique challenges. But one thing I've recently noticed is that many teams struggle in the same area - the data warehouse. While there's no one-size-fits all approach, I found myself repeating my recommendation over the past few weeks so figured I'd share here. For context, what I'm going to share is focused around dbt projects. The typical scenario is that a business starts a project on their own but quickly finds themselves with an unorganized and/or unscalable project. Which is how they end up talking to me. At a high level, here's the simple 3 layered approach I follow: > Layer 1: Staging > Layer 2: Warehouse > Layer 3: Marts Staging: - Create 1:1 with each source table (deploy as views to avoid duplicate storage) - Light transformations for modularity (ex. renaming columns, simple case-whens, conversions) - Break down into sub-folders by source system - Deploy to a Staging schema models/staging/[source-system] Warehouse: - Pull from Staging layer (simple transforms already handled) - Facts: Keys & Metrics (numeric values) - Dimensions: Primary Key & Context (descriptive, boolean, date values) - Deploy to a single Warehouse schema models/warehouse/facts models/warehouse/dimensions Marts: - Pull from Warehouse (facts & dims allow for simple joins) - Create wide tables w/ multiple use cases (vs 1:1 for each report) - Either deploy to a single Mart schema or break up by business unit/user grouping models/marts (or) models/marts/[business-unit] This doesn't cover other important topics like Environments, CI/CD & Documentation. But if you're also working on your own project or considering approaches, hopefully this will help! Other dbt users - how do you structure your project?

New comment 9d ago

Tomas Truchly

1 like • 12d

@Bosete Kumar there will be always lack of senior / lead IT experts including Data Engineers. I get bombarded by job offers all the time. However the requirements are really high for junior/entry level jobs nowadays. So it may be really hard to land a job if you want to become a Data Engineer / Data Analyst / Machine Learning Engineer.

Emile Van Der Heyde

23d ago

🧱 | Strategy & Design

ETL Recommendations

Hey all My company currently has a bunch of lambda functions in aws that extract data from APIs into S3 and then into snowflake. The Lambda function process is working but has limitations. a) Its has a complex set up and to make changes it takes lots of time, b) Monitoring isn't very visible. c) CDC is a challange to manage Since i probably wont be able to get the company to pay for a new tool to do the ETL i need to think of some free tools I can recomend that makes the pipeline robust and easy to monitor add sources and make changes quickly.. I am looking at Airbyte - any advice on this? What alternatives other alternatives are there perhaps?

New comment 3d ago

Tomas Truchly

1 like • 21d

We use Talend Open Studio (free Community version) which is practically cost free & is really mature & powerful ETL solution. However it requires a lot of time to master it. Then we use https://www.datachannel.co/ to collect data from various sources and ingest it into DWH. It's a cheap and quite fine working Airbyte/Fivetran like alternative. Everything is orchestrated via Airflow which is also free open-source, so we pay only for the VM hosted on AWS EC2 server. Overall very cheap and fine functional infrastructure even though it requires certain IT skills/exp to use it well.

Tomas Truchly

1 like • 14d

@Michael Kahan exactly my point. E.g. using Talend Community Edition requires extra tech skills (e.g. need to manage git integration & orchestration yourself) not necessary when using paid version. Airfow VM can be also tough o manage.

Manuel Ponsa

Apr 10

🛠️ | Tooling

Public Data Tooling

Hey! In my company we're on our way to release some public data to our users. I'd like to step up this and not only release the data via a public s3 but offer our users a tool to explore and aggregate this datasets easily. ¿Do you know if something like this exists? I've been looking but haven't found anything useful yet. Update: I'm looking for something like Dune Analytics but for general data (not scoped to blockchain as this project is.) Thanks!

New comment 27d ago

Tomas Truchly

1 like • Apr 10

We use Metabase to allow our clients access their data and play with them a little (though not changing the source). If they want to work with them directly we create ETL pipeline for them

Tomas Truchly

1 like • 27d

@Johann Tagle Redshift. We use AWS overall

Michael Kahan

Apr 3

💬 | General

What part of Data Engineering are you most focused on?

All of us are at different parts of the journey. What part of Data Engineering are you most focused on this year & why?

Poll

25 members have voted

New comment Apr 3

Tomas Truchly

1 like • Apr 3

The thing is employees in small companies are usually involved in multiple areas. Thus even though I have picked development I am also strongly involved in deployment, orchestration & to some degree in architecture.

1-10 of 32

Level 4

67points to level up

Tomas Truchly

@tonas-truchly-8843

Senior/Lead Data Engineer working remotely from Europe for an US startup located in California.

Active 6m ago

Joined Dec 29, 2023

Slovakia

Contributions

Followers

Following