Last year, I put out a survey asking my followers what their data stacks look like and what problems they are facing.
I ran the same survey this year to gain a better understanding of what is going on in the data world.
For this newsletter, we are only going to scratch the surface of this data because there are many different ways we can slice and dice the data.
For part 1 of our initial breakdown, we will be focusing on the following points:
Backgrounds on Who Filled Out the Survey
Vendor Selection (Data Analytics Platforms)
Problems Data Teams Face
I do plan to dig deeper into this data; for example, I skipped over orchestrators in this part 1(unlike the 2023 part 1) only because I didn’t want this to be a direct copy and paste.
So make sure you’re signed up for future newsletters if you’re curious about what tools and practices data teams are implementing.
Limitations Of This Survey
Surveys are hard to do well and, when done in this manner, unavoidably have bias and blind spots. That’s why there are individuals who have PhDs just in running and gathering surveys.
So, although I’d love to call this a “True” State of Data, it would require an asterisk that references the high likelihood of selection bias amongst other forms.
Nevertheless, I do think the survey is worth running, especially over a longer period of time as there will likely be interesting changes and fluctuations that can be tied to some real-world events or trends.
Who Filled Out the Survey?
Let’s start with a broad overview of who filled out the survey.
We had a little over 600 people completing this survey (with about 20% of those respondents returning from last year).
One of the first questions asked was what industry do you work for. Compared to last year, the top industries were about the same but better distributed. Last year, financial services and software were the top two industries(I also just rolled up consulting into one category), which was the same this year.
The difference this year is that we had far more respondents from other industries fill out the survey.
Company Size
In terms of the size of companies that these respondents worked for, we had a wide variety ranging from SMB to enterprise.
The three largest demographics were companies with employees that ranged from 101-500, 501-1,000, and 1,001-5,000.
I also removed any respondents who only had 1 employee.
You can see the rest of the breakdown below:
Job Types
Finally, let’s take a look at job types. Compared to last year, I have actually had a broader range of respondents with only about 32% being data engineers compared to last years .
What Solutions Are Data Teams Using?
Now, let's take a moment to break away from the demographics and discuss solutions. In particular, let's talk about data analytics platforms.
We are entering an interesting era where when someone is referencing a data analytics system, it doesn't mean a data warehouse every time. It could mean S3 + Trino or Databricks + Snowflake on top of S3, among several other combinations.
I bring that up because one respondent decided to comment in their answer that Databricks doesn't fit into the data analytics storage category (which I'd have to disagree). Sure, it's not a database. But it does fit into this “cloud data platform” category that provides a lot of similar functionality including the ability to store data somewhere.
In fact, that's the whole reason I am using the term data analytics storage platform. These days, instead of having a clear data warehouse vendor like Vertica where, when we as data people said "Data Warehouse," the definition was clearer.
Now, we live in a world where you could be using three different solutions to interact with your stored data.
But let's get back to the survey.
One of the shifts from last year was that the number of respondents who referenced more than one data analytics platform in this survey dropped. So last year about 49% of respondents referenced using multiple analytics platforms, whereas this year it was only around 45%.
Going beyond that and looking both at companies that only have one data analytics storage platform and those that might use multiple, the breakdown of all of that is shown below.

Quick pause! If you’ve enjoyed this content and have been considering becoming a paid subscriber, I wanted to provide a coupon for 50% forever.
Now, with Snowflake being so prevalent in this specific survey, I wondered if the fact that many respondents in the survey worked for companies ranging from 51 to 5,000 employees played a role.
So, I also broke down another chart illustrating what percentage of respondents in varying-sized companies actually referenced Snowflake.
You can see that in the chart below; most of the respondents that referenced Snowflake worked for that similar range (101-25,000).
This, in turn, made me wonder; if I were to look at above 5,000, what is the most commonly referenced data analytics platform?
That’s shown in the chart below, now with Databricks on top.
Of course, even if this is an accurate depiction, it’s still hard to judge the actual impact since we live in a world where the fight is for workloads not just for contracts. Just because a company says they are using Databricks or Snowflake doesn’t mean you know the weight of their spending. In fact, I know of several companies I have talked to where they say they have certain solutions, but no one is using them.
Data Team Challenges
Now I don’t want this article to be an exact copy of the prior one, so I’ll hold off on talking about orchestrators. However, I think you can guess that Airflow remains a popular solution despite all the negative campaigns I have seen run on it.
So let’s dive into data teams, what they are working on, and the challenges they face.
One of the questions I didn’t ask in the last survey that I thought would be interesting to learn more about would be around cost savings. A lot of data teams say they care about costs, but I was curious: have they taken on any projects aimed at saving costs?
That’s why I asked the question:
Has your team taken on a data infrastructure cost reduction project in 2023
Of the respondents, about 46% said yes, that their team had taken on this project. I had actually assumed it’d be higher in the 60% area.
I assumed it’d be higher because almost everyone I talk to says costs matter. So I’d assume we’d see a higher percentage of teams investing in cost saving projects.
Another question that I do leave more open-ended, perhaps to my detriment, is asking about the problems your data team is facing (next year I’ll give you a limited set of options). But, I am always curious to know the pains that data engineers and practitioners face.
Last year, hiring and talent were the biggest issue referenced.
In the most recent survey, this fell dramatically. It barely made the top 10 issues. Now, perhaps this is because of layoffs or because data teams are shrinking (that last comment is more anecdotal).
Instead, the largest issue referenced this year was data quality, followed by data infrastructure costs. I did find this interesting because many of the other results stayed pretty similar overall. However, the issues data teams faced changed.
Now again, as referenced earlier, only about 20% of respondents were the same as last year , so, perhaps, there was a major shift.
But despite the shift, some of the other spreads remained similar.
The End Of Part 1
With that, I want to say thank you for everyone who responded to the survey! It’s always interesting to see what tools and solutions the Seattle Data Guy community are using.
Don’t worry this was just part one! So if you have more questions please feel free to reach out.
Thanks for reading!
Also, if you’re organization needs help setting up their data infrastructure or improving their data strategy, then feel free to reach out for a free consult.
Join My Data Engineering And Data Science Discord
If you’re looking to talk more about data engineering, data science, breaking into your first job, and finding other like minded data specialists. Then you should join the Seattle Data Guy discord! We are close to passing 6000 members!
Join My Data Consultants Community
If you’re a data consultant or considering becoming one then you should join the Technical Freelancer Community! I recently opened up a few sections to non-paying members so you can learn more about how to land clients, different types of projects you can run, and more!
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
Synchronizing Organizational Decisions
By
One of the most insidious problems in most data driven organizations has nothing to do with data technologies. It’s also not financial. It’s purely organizational.
Imagine two organizations (let’s call them A and B) selling a B2B SaaS product that helps companies manage projects. Both organizations conduct an annual planning meeting where they decide that their objective for the year is to reduce churn/increase retention.
The impact of churn reduction is huge as you can read here.
End Of Day 123
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, tech, and start-ups.