Activity
Mon
Wed
Fri
Sun
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
What is this?
Less
More

Memberships

Modern Data Community

Public • 568 • Free

AE Academy

Private • 6 • Free

32 contributions to Modern Data Community
Ratio of data engineers to analytics engineers
What is the ratio of people in your organization who are working on your data integration layer vs your data modeling/transformation layer vs BI/Visualization layer? Regardless of job title, be it data engineer, analytics engineer, data analyst, reporting analyst, whatever, how many people in your stack are working just on data integration vs modeling? I ask because I've seen this fluctuate in the orgs I'm in based on tooling choice and the health of the system. In the best, most agile org, the ratio of data engineers to analytics engineers was 1:3 to 1:4. In the least agile, most frustrating (for the business stakeholders waiting on data and reports), the ratio was 3:1 to 4:1, totally reversed. Thus far, this has been a tooling issue. The choice of data integration tool caused an enormous amount of manual work to acquire data and put together all the orchestration pieces etc before it was available for modeling or transformation. Help me fight my bias and tell me what you've seen in your previous roles and in your current role
0
2
New comment 6d ago
1 like • 6d
I don't think is that easy to mark data teams more occupied with data integrations & transformations than visualizations to be least agile or frustrating. Sure from business POV the dasboards / reports are the final product they can use / sale / understand so spending effort on DE work may seem unefficient. But the most work represents just that if you are building new "products". Not saying not all data integrations are used for visualizations only. Sometimes there is need to sync data between systems / DB. That may be a lot of "purely" DE work.
Data Warehousing w/ dbt - A 3 Layered Approach
This is a friendly reminder that the "data grass" isn't always greener on the other side. Everyone is doing their best and every business has their unique challenges. But one thing I've recently noticed is that many teams struggle in the same area - the data warehouse. While there's no one-size-fits all approach, I found myself repeating my recommendation over the past few weeks so figured I'd share here. For context, what I'm going to share is focused around dbt projects. The typical scenario is that a business starts a project on their own but quickly finds themselves with an unorganized and/or unscalable project. Which is how they end up talking to me. At a high level, here's the simple 3 layered approach I follow: > Layer 1: Staging > Layer 2: Warehouse > Layer 3: Marts Staging: - Create 1:1 with each source table (deploy as views to avoid duplicate storage) - Light transformations for modularity (ex. renaming columns, simple case-whens, conversions) - Break down into sub-folders by source system - Deploy to a Staging schema models/staging/[source-system] Warehouse: - Pull from Staging layer (simple transforms already handled) - Facts: Keys & Metrics (numeric values) - Dimensions: Primary Key & Context (descriptive, boolean, date values) - Deploy to a single Warehouse schema models/warehouse/facts models/warehouse/dimensions Marts: - Pull from Warehouse (facts & dims allow for simple joins) - Create wide tables w/ multiple use cases (vs 1:1 for each report) - Either deploy to a single Mart schema or break up by business unit/user grouping models/marts (or) models/marts/[business-unit] This doesn't cover other important topics like Environments, CI/CD & Documentation. But if you're also working on your own project or considering approaches, hopefully this will help! Other dbt users - how do you structure your project?
6
9
New comment 9d ago
1 like • 12d
@Bosete Kumar there will be always lack of senior / lead IT experts including Data Engineers. I get bombarded by job offers all the time. However the requirements are really high for junior/entry level jobs nowadays. So it may be really hard to land a job if you want to become a Data Engineer / Data Analyst / Machine Learning Engineer.
ETL Recommendations
Hey all My company currently has a bunch of lambda functions in aws that extract data from APIs into S3 and then into snowflake. The Lambda function process is working but has limitations. a) Its has a complex set up and to make changes it takes lots of time, b) Monitoring isn't very visible. c) CDC is a challange to manage Since i probably wont be able to get the company to pay for a new tool to do the ETL i need to think of some free tools I can recomend that makes the pipeline robust and easy to monitor add sources and make changes quickly.. I am looking at Airbyte - any advice on this? What alternatives other alternatives are there perhaps?
2
4
New comment 3d ago
1 like • 21d
We use Talend Open Studio (free Community version) which is practically cost free & is really mature & powerful ETL solution. However it requires a lot of time to master it. Then we use https://www.datachannel.co/ to collect data from various sources and ingest it into DWH. It's a cheap and quite fine working Airbyte/Fivetran like alternative. Everything is orchestrated via Airflow which is also free open-source, so we pay only for the VM hosted on AWS EC2 server. Overall very cheap and fine functional infrastructure even though it requires certain IT skills/exp to use it well.
1 like • 14d
@Michael Kahan exactly my point. E.g. using Talend Community Edition requires extra tech skills (e.g. need to manage git integration & orchestration yourself) not necessary when using paid version. Airfow VM can be also tough o manage.
Public Data Tooling
Hey! In my company we're on our way to release some public data to our users. I'd like to step up this and not only release the data via a public s3 but offer our users a tool to explore and aggregate this datasets easily. ¿Do you know if something like this exists? I've been looking but haven't found anything useful yet. Update: I'm looking for something like Dune Analytics but for general data (not scoped to blockchain as this project is.) Thanks!
1
4
New comment 27d ago
1 like • Apr 10
We use Metabase to allow our clients access their data and play with them a little (though not changing the source). If they want to work with them directly we create ETL pipeline for them
1 like • 27d
@Johann Tagle Redshift. We use AWS overall
What part of Data Engineering are you most focused on?
All of us are at different parts of the journey. What part of Data Engineering are you most focused on this year & why?
Poll
25 members have voted
2
3
New comment Apr 3
1 like • Apr 3
The thing is employees in small companies are usually involved in multiple areas. Thus even though I have picked development I am also strongly involved in deployment, orchestration & to some degree in architecture.
1-10 of 32
Tomas Truchly
4
67points to level up
@tonas-truchly-8843
Senior/Lead Data Engineer working remotely from Europe for an US startup located in California.

Active 6m ago
Joined Dec 29, 2023
Slovakia
powered by