Data Engineering Vs Machine Learning…

Apr 11, 2023

105

What's the difference?

Read →

5 Comments

Axel Schwanke

Dec 28, 2023

One Remark:

On the subject of data cleansing for ML Pipelines, the article states: "While data engineers do an initial cleaning, data scientists take it further to ensure accuracy and usefulness."

From my own experience as a data engineer in the real estate sector, I can say that we often perform extensive cleansing and enrichment of the data to ensure that it can be used not only by data scientists, but also by other areas such as market research, e.g. to analyze price trends in recent years.

The cleansing, which is independent of the application purpose, is intended to ensure that the data is cleansed and enriched uniformly so that the results of the various application-areas are consistent.

Expand full comment

Reply (1)

Sarah Floris

Dec 28, 2023

Some definitely do, and some do not. It depends on the team.

Expand full comment

Kyle Stratis

Jun 11

I think it's important to remember (or acknowledge) that before the ML Engineer title (and now AI engineer) became 'sexy' the people doing this work were...data engineers. My first job doing what became ML engineering (and my 3rd overall) was a data engineering role. This likely accounts for the similarities, even though the titles have grown apart in the last 5-6 years. In my opinion, ML engineering and data engineering are siblings with the same core and ancestry, but have their own flavors and cultures now. DE seems even more focused on serving the BI world, while MLE's customers are usually internal R&D teams while bridging the gap between R&D and product engineering.

I also strongly believe that the differences between the two roles in practice are much smaller than posted here, or non-existent (pipelines used in ML workflows can certainly be streaming! MLEs build data pipelines as a matter of course for the ML development lifecycle!), and examining the similarities could be even more instructive than the differences.

Expand full comment

A Kogi

Apr 12, 2023

Hey, I think a typo is present in “[...] accuracy above 80% is a good benchmark to aim for, but it's also important to monitor other metrics such as accuracy if working with regression models or precision [...]”

Accuracy is repeated twice here.

Otherwise, interesting article containing many great sources!

Expand full comment

Alperen

Apr 12, 2023

Awesome content. Thank you for sharing!!

Expand full comment

SeattleDataGuy’s Newsletter

Data Engineering Vs Machine Learning…