4 Comments
Dec 28, 2023Liked by Sarah Floris

One Remark:

On the subject of data cleansing for ML Pipelines, the article states: "While data engineers do an initial cleaning, data scientists take it further to ensure accuracy and usefulness."

From my own experience as a data engineer in the real estate sector, I can say that we often perform extensive cleansing and enrichment of the data to ensure that it can be used not only by data scientists, but also by other areas such as market research, e.g. to analyze price trends in recent years.

The cleansing, which is independent of the application purpose, is intended to ensure that the data is cleansed and enriched uniformly so that the results of the various application-areas are consistent.

Expand full comment
author

Some definitely do, and some do not. It depends on the team.

Expand full comment

Hey, I think a typo is present in “[...] accuracy above 80% is a good benchmark to aim for, but it's also important to monitor other metrics such as accuracy if working with regression models or precision [...]”

Accuracy is repeated twice here.

Otherwise, interesting article containing many great sources!

Expand full comment

Awesome content. Thank you for sharing!!

Expand full comment