The role of data engineer has morphed drastically in the last decade. A decade ago it seemed like employers thought their data scientists should both be able to calculate eigenvectors and understand how to write MapReduce jobs for Hadoop.
Eventually, this work became more specialized and the term data engineer started to appear more and more. Originally referring to those faithful data practitioners who interacted with Hadoop in its purest form. Then later being assisted by tools such as Flume, Sqoop, and Pig.
As time continued, the role of data engineer started to bifurcate. At larger organizations at least. Perhaps they merely started to fit into the old roles that used to hold their place such as ETL developer. Regardless, individuals and teams began to specialize.
Generally speaking the breakdown at varying-sized companies might look something like the diagram below.

There is far from a perfect set-up when it comes to teams. However, you will want to make sure you balance the work being done, your data quality, output, security, and usability.
I have worked with companies of all sizes and seen some of the combinations above. But what do each of these teams do?
Although it’s not always cut and dry, let’s break down what many of these teams do.
Software Engineer, Data Infra Teams
Some software engineers enjoy working heavily on data infrastructure projects. For example, these engineers like designing and building query optimizers, data catalogs, and or data pipeline solutions.
Solutions that are just focused on storing, processing, and managing all of the data we have been collecting over the past few decades. It can get very meta as they focus more on solutions like data catalogs or security.
One person who I believe has had their career mostly focused on this type of work would be Neelesh Salian. Throughout his career, he has worked on projects that were focused on building tools for other data practitioners. He has contributed to Apache projects such as Spark, Hive, Iceberg, and Hadoop. Ensuring that all of the rest of us, data engineers, analysts, and data scientists, can work with the ever-growing pile of data.
These engineers build the components that will be managed by the next grouping of data-focused engineers.
Keep reading with a 7-day free trial
Subscribe to