InterviewQuery surveyed 10,000 interviews that occurred in 2020. They analyzed interviews for data scientists, data engineers, data analysts, and several other data specialties.
Their goal was to see what questions were being asked in said interviews, what types of roles were being interviewed heavily, and what impact did Covid have on the interview process.
This article provided a lot of interesting insights as it broke down what question topics are asked during interviews for data scientists and data engineers and what skills seem to be in demand.
There was one statement or at least sentiment I disagreed with.
That is to say, the article seems to state that data scientists are being replaced by data engineers. Perhaps this was just click-bait.
However, the truth of the matter is that data engineers and data scientists perform very different functions, and replacing one with the other wouldn’t provide the same output. Also, there has always been a higher demand for data engineers vs. data scientists according to indeed.com.
In this newsletter, I wanted to outline why I believe that data engineering interviews continue to grow in number whereas data scientists may be plateauing for now.
Data Engineering Work Always Comes First
As someone that consultants for companies on data. I often come in to either build from the ground up, migrate, or fix companies current data system.
Regardless of what type of project I am doing. It always starts with data engineering and data management. This could mean organizing and tracking down all the data sources, reconstructing patched together data pipelines, or developing a data warehouse.
Every project is slightly different, but regardless, it all starts with data.
Also, the general goal is to develop a data infrastructure that is robust and can be used repeatedly, and is automated.
Data Infrastructure Is Hardly Done Well
I have worked with all forms of data infrastructure and I think people would be surprised at how pieced together some companies' data infrastructure is.
Many companies still have large parts of their data infrastructure pieced together by cron jobs and JSON config files.
And even those who have moved away might have a complex array of tools that all perform similar work and lack a central structure.
This is the difference that many large tech companies have. Many of them like Amazon and Google have developed a centralized set of tools where all their teams use similar processes to get data pipelines into production.
It seems like a small difference.
But this makes it massive difference when it comes to maintenance, skill transfers, and the ability for companies to scale their data quickly.
Picking the right data infrastructure has huge implications. We spend our first part of any project assessing a companies goals and their current technology stack to see what will work best for them.
Picking the wrong infrastructure could lead to costly future migrations and maintenance.
Data Engineers Are Still A Bottle Neck
I have talked to data scientists and analysts at companies that range from telecommunications to finance.
All of them often have the same problem.
Data engineers aren’t getting them data fast enough.
Now, I don’t think this is because of a data engineering incompetence problem.
This problem seems to be prevalent everywhere.
Data isn’t easy to wrangle and tons of companies and start-ups are trying to make data processing much easier.
Of course, data keeps growing.
We aren’t just talking about ERP and CRM data here. We are also talking about click streams, IoT devices, and other sources that are going to become more challenging to manage over time.
Meaning that the data engineers that companies do have will just become more and more overwhelmed.
I believe this is pushing companies to look away from hiring more data scientists since they wouldn’t even have anything for them to do until the data engineers finish their pipelines.
So What Will Happen To Data Scientists?
I still see a lot of work ahead for data scientists, I just foresee companies needing to spend time to figure out their base layers of data infrastructure first.
Also, with all the new tools coming into the data science space meant to help make it more operational, I think we will see an increase in data scientists over time.
Overtime and as companies improve their data strategy and maturity I am sure we will see a mass hiring of data scientists.
Ask A Data Consultant - Office Hours
Every newsletter I open up a day or two with a few slots for open office hours where my readers can sign up and you can ask me questions. I got to answer a lot of great questions so far and hopefully, they helped provide a lot of insights for those who signed up.
Sign Up Below:
Next Open Office Hours
Articles Worth Reading
There are tons of great articles on data science and engineering. The section below has a combination of articles we have read as well as written that cover some current topics in the data and tech space.
7 Real-Time Data Streaming Databases
In the modern era, everyone expects their data the second it’s updated (if not somehow magically before the data occurs).
Large corporations and Fortune 500 companies depend on this data to be able to predict consumer tastes or estimate where the forces of demand and supply are moving the market.
In turn, many companies are working to modify their batch-style data pipelines into real-time data streams. Real-time data streams provide the ability for analysts, machine learning researchers, and data scientists to develop metrics and models that run as soon as new data is created.
A Model-driven Culture is Crucial for Data Science Success
Software has a robust “Software Development Lifecycle” that’s been well-matured over the last two decades. Data science needs its own “Data Science Lifecycle”, which is why we launched the Data Science Lifecycle Assessment last year. Now, data science leaders can gauge their maturity across the six stages of the data science lifecycle – ideation, data acquisition & exploration, R&D, validation, delivery, and monitoring. We’re finding that even the most mature companies have areas they can improve upon to help make data science more scalable and valuable to their business.
That’s why we partnered with DataIQ to survey their membership of data and analytics professionals to understand more about their approaches to data science. Of businesses that can measure the business benefit of data science, about one in four organizations expect data science to impact topline revenue by more than 11%. For these companies, their investment in data science and their emphasis on making it a first-class corporate function are already paying significant benefits.
Visualizing Data Timeliness at Airbnb
Imagine you are a business leader ready to start your day, but you wake up to find that your daily business report is empty — the data is late, so now you are blind.
Over the last year, multiple teams came together to build SLA Tracker, a visual analytics tool to facilitate a culture of data timeliness at Airbnb. This data product enabled us to address and systematize the following challenges of data timeliness:
When should a dataset be considered late?
How frequently are datasets late?
Why is a dataset late?
The End Of Day Four
With that, we will wrap up our 4th newsletter. We hope you learned something about what is going on in the data space. It can be hard to keep with all the new terms, words, start-ups, and just general hype.
If you have any questions, please comment below or sign-up for our office hours!
I am also going to be looking into adding a video section into this newsletter in the coming articles.