Discussion about this post

User's avatar
Axel Schwanke's avatar

One Remark:

On the subject of data cleansing for ML Pipelines, the article states: "While data engineers do an initial cleaning, data scientists take it further to ensure accuracy and usefulness."

From my own experience as a data engineer in the real estate sector, I can say that we often perform extensive cleansing and enrichment of the data to ensure that it can be used not only by data scientists, but also by other areas such as market research, e.g. to analyze price trends in recent years.

The cleansing, which is independent of the application purpose, is intended to ensure that the data is cleansed and enriched uniformly so that the results of the various application-areas are consistent.

Expand full comment
Kyle Stratis's avatar

I think it's important to remember (or acknowledge) that before the ML Engineer title (and now AI engineer) became 'sexy' the people doing this work were...data engineers. My first job doing what became ML engineering (and my 3rd overall) was a data engineering role. This likely accounts for the similarities, even though the titles have grown apart in the last 5-6 years. In my opinion, ML engineering and data engineering are siblings with the same core and ancestry, but have their own flavors and cultures now. DE seems even more focused on serving the BI world, while MLE's customers are usually internal R&D teams while bridging the gap between R&D and product engineering.

I also strongly believe that the differences between the two roles in practice are much smaller than posted here, or non-existent (pipelines used in ML workflows can certainly be streaming! MLEs build data pipelines as a matter of course for the ML development lifecycle!), and examining the similarities could be even more instructive than the differences.

Expand full comment
3 more comments...

No posts