Why You Should Upgrade Your Data Infrastructure
Going Through The Reasons Companies Have To Change Their Data Stack
Companies that have successfully used data don't generally try to big bang their data infrastructure.
That is to say, they don't try to create the most awe-inspiring technical diagram.
Instead, they often go through a few phases. Each phase helps them grow on both the technical and business side of using data.
Over the past few years, I have seen several very common path for companies, especially those that are start-ups, SMBs, and even a few mid-markets. Of course there are plenty of paths that data teams can take. But I wanted to go through one of these common ways companies go through setting up their data infrastructure.
It came up again recently as I talked to a prospect who was still using a replica database to do their reporting several years after I initially talked to them where they were using…a replica database (and it still works).
So despite the tech stacks and diagrams you may see me or others put out, there are plenty of companies out there running data stacks that you might not consider modern or whatever you want to call it.
In this article, I wanted to talk about some of these different phases and why data teams decide to migrate to the next phase.
Spreadsheets - Phase 1
If you’ve worked in data, you’ve used Excel. Maybe you even got more advanced and started to mess around with Power Query and macros.
Overall, Excel is not exactly what people consider a modern tool. Yet, it is probably still one of the most relied upon solutions when it comes to data.
And with good reason.
It’s got a low floor and high ceiling.
Using spreadsheets tends to be the first phase for many companies as they are developing their analytical capabilities.
In this phase, likely, an analyst or operations lead is asking a software engineer every week for a data dump for them to look at. They’ll then slice and dice it with a few pivot tables and answer some baseline questions.
No fancy machine learning model here(I guess now we can probably just ask GPT-4 to make our spreadsheet for us thought).
Both the size of the data and the frequency of these requests make it unnecessary to automate. It’d be expensive and overkill.
Thus, at this stage it’s more about answering basic questions and setting up processes.
Why teams go beyond spreadsheets?
Between each phase of your team's technical capabilities, you’ll run into some form of limitation.
For example, here are several key driving factors that push teams from spreadsheets to replica databases.
The need for more timely data becomes apparent; either an executive needs to make decisions more frequently, or operational teams need the information.
Excel can no longer manage the amount of data that needs to be processed, and it needs to be moved to another tool.
Keep reading with a 7-day free trial
Subscribe to SeattleDataGuy’s Newsletter to keep reading this post and get 7 days of free access to the full post archives.