The concepts and skills data engineers need to know have been around for decades.
However, the role itself has really only been around for a little over 10 years, with companies like Facebook, Netflix and Google leading the charge.
Throughout those ten years, there were significant breakthroughs and tools through hype, and general acceptance became standard (if even only for a few years). Each of these tools and practices shifted how we as data engineers operate.
On top of that, various regulations and legislation reignited a need for better security and governance policies that have always been crucial in data management.
Each of these various turning points pushed data engineering as a discipline to the next level. In this article, we will be going over the 10+ years of data engineering history that got us to where we are today.
2011-2013: Why Did Data Engineering Become Popular - The Rise of Big Data and Hadoop
The term data engineering became popular around the same time as big data and data science, but it was a lagging indicator. Companies spent a lot of money investing in data science teams and setting up Hadoop clusters.
They then started expecting their data scientists to write map-reduce jobs along with them creating algorithms and neural networks. This quickly forced some teams and individuals to specialize; they specialized in querying, parsing, and processing data sets in various shapes and sizes, and suddenly, there was the inkingly of a new role.
Thus, the rise of the data engineer started even before it became popular with Google. Among many things, it was an undefined role before it was a role, the goal being that by specializing, data scientists could focus on driving value with data while data engineers could focus on owning said data.
Of course, at this time, there was also the rise of Hadoop. However, the original Hadoop development started 10 years prior to the launch of Google’s GFS paper in 2003, and several other large tech companies quickly picked it up. Generally, there is a 5-10 year (anecdotally speaking) adoption lag between big tech and large organizations.
Now, many of these companies relied on Hadoop because of the need to process massive data amounts since their business models depended on it.
Thus, they weren’t trying to develop the technology because they read about it in an article, but rather because they had both the scale and need to implement improved methods of data processing and storage.
But by late 2009, big tech companies weren’t the only ones looking to utilize Hadoop. Like many other technology waves, Yahoo, Google, Facebook, Oracle and other big tech employees decided they wanted to create managed versions of Hadoop to make it both easier to manage and…well...to make a profit.
We’ll get to 2014 later, but that’s when I first started running into these solutions.
Prior to that, many of these solutions were booming, at least in funding. Cloudera eventually received a $900 million round at a $4.1 billion valuation.
And if you broke into the data engineering world in 2011, you’d probably feel all the excitement around the topic of Hadoop. It felt like everyone was learning about MapReduce and Java, not to mention they were also picking up tools like Flume, Sqoop, and Pig.
This was one of the significant developments. Instead of needing expensive hardware to run queries and massive data sets, you could now use cheaper hardware managed by better software.
But as referenced above, these solutions eventually required a schism between the data scientists and data engineers, to make sure companies had reliable data sets.
And that leads to the next era–the adoption of the data engineer.
2014-2016: The Crystallization of the Data Engineer
Keep reading with a 7-day free trial
Subscribe to