5 Key Predictions for the Data Industry in 2026
Hype Cycles, Rebrands, and the Messy Reality of Data
Hi, fellow future and current Data Leaders; Ben here 👋
Today I am taking a pause from my data pipeline series(it’ll start again next issue) to share some thoughts about the data world and where it’s headed in the next year or so and other trends.
But before we jump in, I wanted to share a bit about Estuary, a platform I’ve used to help make clients’ data workflows easier and am an adviser for. Estuary helps teams easily move data in real-time or on a schedule, from databases and SaaS apps to data lakes and warehouses, empowering data leaders to focus on strategy and impact rather than getting bogged down by infrastructure challenges. If you want to simplify your data workflows, check them out today.
Now let’s jump into the article!
One twelfth of the year is over, at least by months, and somehow it feels like a year’s worth of events have occurred.
By the end of 2025, dozens of companies were swallowed up. Everyone wanted to buy everyone, and here we are in 2026, seeing more of that as well as some pretty slick new AI model releases.
But let’s turn towards the future, and specifically, data.
Here is what I believe we’ll see happen in the next year or two.
1) Microsoft Fabric Will Rebrand..Again
If you scroll around LinkedIn enough, you’ll likely find a few posts about Microsoft Fabric not being the “it” tool. As some people have put it:
Now, it doesn’t mean it’s not growing. According to their recent earnings, Azure is growing at 39%. Y/Y.
But, if the sentiment around Fabric continues to grow in the wrong way, I’d predict Microsoft rebrands their data stack…again.
Especially with all the AI-hype. They’d be able to push the narrative that the new solution is AI-first and come up with a great new name.
Looking back over the past decade, Microsoft’s done this several times. To the point where it was a little confusing in terms of which tool is which.
So maybe they’ll try yet again.
2) 1% Of Companies Will Continue To Cry For AI While The Other 99% Are Still Trying To Export ERP Outputs To Excel
There is a lot of demand for AI. It needs to be integrated everywhere, right?
On the flip side, many companies are still sharing data via SFTP or pulling it from an API. Sure, maybe they used AI to help write the code faster.
But then they went to scroll on Instagram afterwards.
You know what AI tool I want to see(and maybe I should take a crack at it). I want to see an AI solution that lets me take an Excel spreadsheet, drop it in, and it automatically builds out a data pipeline that can replace it perfectly. Or as perfect as it can…
It’d pull out all the formulas, turn them into SQL or Python logic, and put them into a larger system of data pipelines.
Because we aren’t going to replace Excel anytime soon. Every company has it no matter what data decade they are in. Excel is capturing business logic.
So why fight it?
It’s just too easy to build a quick spreadsheet that quickly turns into a core component in a business workflow.
No one wants to fill out a restrictive form, and sure, coding is nice, but it’s also heavy.
So why not just open a spreadsheet, build it, and have an easy button to turn it into a pipeline?
Is it just that easy, right?
3) Modern Data Stacks Will Be Shaken
With all the start-ups that have been bought up recently and others raising prices I foresee a shake-up in the default approaches that companies use to build their data stacks.
On top of that I wouldn’t be surprised that we see a wave of companies needing their data stacks re-built from the ground up.
Between fragile data pipelines, changing pricing models, sunsetting solutions, and Just-in-Time data models, people will take another look at what they built and see if it actually meets their needs.
By the way, great segway, if your data team needs help revamping your data infrastructure, whether you’re on Databricks, Snowflake, Bigquery, or all of the above, then reach out for a consultation!
This opens the door for new solutions as well as, I hope, building more solid data foundations, which if you need a good way to pitch it to data leadership.
Just call it “AI-foundations”.
4) AI POCs Will Start To Build Actual Foundations
Over the last few years, we’ve been fed what endless new terms and patterns for how to make LLMs useful. I believe we’ll start seeing more crystallized patterns for how companies are actually planning to use LLMs(for more than just writing code).
Because for every fifty projects that were driven by hype, there are one or two where the engineers focused on delivering a reliable solution. When they came up against problems, they didn’t rush through them or ignore them.
They actually spent time trying to figure out how to work with what they were getting. They tried to figure out the actual value of the LLM beyond the surface-level use cases.
It takes time to develop design patterns and processes, and soon we’ll have enough iterations where some teams will be able to reliably execute ideas. Over the past few tech hype cycles the general process I see is:
New capability appears - A breakthrough hits (LLMs, streaming, blockchain, “big data”, etc.). Early demos look magical, but the real constraints aren’t understood yet.
Everyone builds the obvious thing first - For LLMs, this was:
Chatbots everywhere
“Ask your data anything” demos
Code generation and copilots
Reality sets in - Teams start to run into problems. Think hallucinations, cost blowups, security and governance concerns and in some cases, things just don’t work as expected. You need to start integrating safe-guards and best practices(that no one has created yet, you..you are the one creating them!)
Patterns start to crystalize - Every new capability comes with limitations. But you wont’ know them until you implement it. Until it hits scale or just has to perform an edge case you hadn’t considered.
Becomes a standard - Eventually new capabilities are viewed as a standard piece of infrastructure. Integrated in such a way where you notice it less because it smoothly fits into the rest of your flow.
The hype fades and we figure out where the new capabilities fit best - We go from, this new thing can solve all problems to, here is where this new capability is really good at solving our real problems.
5) Snowflake Will Rediscover Themselves
Although I don’t like thinking in terms of Snowflake vs Databricks….I do have a hard time not comparing them to each other…
As an outsider looking in, Snowflakes vibes are off(as the kids would say).
At their core, they offer a solid data warehouse solution, but their overall strategy, to me, is unclear.
They have heavily relied on partners for a long time, but now, you get the sense that they want to start pushing into other functionalities, while still being friendly with their partners.
Personally, this has led to some of their recent feature add-ons lacking commitment. They could make their dbt integration good, but personally, I find it just fine. I want it to feel less like just tacked on functionality and more like a well integrated part of Snowflake.
I believe Snowflake could implement it in such a way that it could make needing to use dbt Cloud unnecessary. But maybe they don’t want to. They want to straddle both being a partner driven business and an all-in-one data solution.
Databricks, on the other hand, is pushing to solidify its hold on data engineers and analysts. They’ve had a foothold over a good portion of the data engineering identity, and now they are partnering with Alex the Analyst for analytics. This is just marketing.
When it comes to their product, you know Databricks wants to be an all-in-one tool. Sure, they have partners, but they’ve stuck to their core identity; they are all-in-one.
I know I should bring data on this, and I am trying to think what would be a good alternative data source to prove what my gut is saying. But Snowflake gives the vibe that it’s lost it’s way. I really enjoyed reading the book Playing to Win: How Strategy Really Works.
Most of the book discusses the diagram below in the usage of the strategy.
When I overlay that thinking on the Databricks vs Snowflake I can see that Databricks has committed to its choices and where it is playing, where Snowflake hasn’t.
I do think Snowflake will find its way, one way or another, this year.
Final Thoughts
The data world is still suffering from many of the same challenges it has been for decades. Yes, we’ve added new tools and solutions, but businesses are still trying to find value from their data without getting too distracted by hype.
There are a lot of high-level posts and articles on driving value via data, but I think there is a gap when it comes to speaking on patterns of value that most businesses could find easily.
That’s one of the future series I want to put out after the data pipeline article.
So keep an eye out!
As always, thanks for reading!
Video Of The Week - Common Data Pipeline Patterns You’ll See in the Real World - Types Of Data Pipelines You’ll Build
Articles Worth Reading
There are thousands of new articles posted daily all over the web! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
What It Actually Takes to Build a Data Pipeline System
When I first started in the data world, it was common that many data teams would build their own data pipeline solutions. There were still dozens of options in terms of off the shelf tools of course, nevertheless, you’d see custom pipelines developed everywhere.
In 2025, I saw less of this.
In fact, in many cases data teams would go straight to picking tools or solutions.
But let’s say you do want to go down this route. You want to build your own data pipeline solution?
How would you do it?
How Uber Scaled Data Replication to Move Petabytes Every Day
Uber prioritizes a reliable data lake, which is distributed across on-premise and cloud environments. This multi-region setup presents challenges for ensuring reliable and timely data access due to limited network bandwidth and the need for seamless data availability, particularly for disaster recovery. Uber uses the Hive Sync service, which uses Apache Hadoop® Distcp (Distributed Copy) for data replication. However, with Uber’s Data Lake exceeding 350 PB, Distcp’s limitations became apparent. This blog explores the optimizations made to Distcp to enhance its performance and meet Uber’s growing data replication and disaster recovery needs across its distributed infrastructure.
End Of Day 209
Thanks for checking out our community. We put out 4-5 Newsletters a month discussing data, tech, and start-ups.
If you enjoyed it, consider liking, sharing and helping this newsletter grow.






“Microsoft Fabric is Databricks from Temu” 😂 LOL. Love it (the quote, not Microsoft Fabric)
I'm starting up an applied AI "lab" for practical data use cases, and the excel transpiler idea is a good one to add to the list.
I actually had success porting a gsheet to a flask app last year using only screenshots in chatgpt, so I can certainly see a path to a pipeline builder.