Data Horror Stories - What Could Possibly Go Wrong?
Working for data teams unavoidably creates horror stories that crystalize themselves in the brains of those who have experienced them.
Dropping production tables.
Terrible bosses who don’t understand anything about data.
Fragile infrastructure that is just asking for something to break.
Even people I talked to in the space who have only been working for a few years have run into their fair share of challenging situations in the data field. Over the last few weeks I have asked people to share their horror stories. Times when data went wrong or just working in the data industry was difficult. Maybe it made them want to quit or leave. I have left most of them anonymous as this article isn’t meant to call anyone out.
In this article you will read several of these data horror stories and hopefully they don’t bring too many flashbacks.
Migration Led To Different Data Sets And The New Data Set Was Right?
Migrating data systems and infrastructure is a standard project many data engineers face. As teams switch from one system to another it is also standard to run comparisons to make sure the data from both systems match.
Usually, this leads to some discrepancies in the new data set that require some tweaking of logic to make sure they match the old data set. Since the old data set should be right..right?
In this case, this valiant data engineer rebuilt the backend of a government system where the data was provided by a third party. So far so good.
As they put it.
Turned it around quickly (Dremio was immense), and months down the line data validity was questioned after it was compared to the old system and the original text. - Anonymous Linkediner
The data didn’t match.
But upon deeper analysis, they discovered that the old data…was never accurate, to begin with(and the new data was). Now that’s scary. They didn’t tell me what the data was. But, I do hope no major decisions were made.
Their final point:
If ever a project demonstrated a need for data validation inline, using SodaSQL for example, this was it. - Anonymous Linkediner
Data Works Even When It’s Wrong
Keep reading with a 7-day free trial
Subscribe to SeattleDataGuy’s Newsletter to keep reading this post and get 7 days of free access to the full post archives.