It’s 2025, and the data industry is already off to a whirlwind start. At the end of last year, Databricks secured one of the largest funding rounds in history, Rivery was acquired by Boomi, and the massive buyout of Tabular (Edit: I originally wrote Iceberg was purchased by Databricks which was incorrect) has everyone talking.
And that’s without even mentioning AI.
With so much happening, it’s natural to wonder: where is the data world headed in 2025? Over the past few years, we’ve witnessed an explosion of technologies designed to help organizations collect, process, and analyze data. But while these tools are undeniably powerful, they often cater to the technologist more than the business leader.
The real question for 2025 isn’t just which tools will dominate the market, but what outcomes they’ll deliver.
As organizations face increasing pressure to maximize the return on their data investments, they’ll need to make strategic technology choices and embrace tailored solutions that align with their unique needs. For some, this will mean pushing the boundaries of innovation with cutting-edge technologies. For others, simplicity will take center stage.
Here’s what you can expect to see shaping the data world this year.
1) Iceberg Will Cement Its Stronghold—Sort Of
If you’re in the data space, you’ve probably heard plenty about Iceberg by now— especially since Databricks acquired Tabular, which was founded by the original creators of Iceberg. So, does that mean we’re all standardizing on Iceberg moving forward?
In a perfect world, having a universal standard for data storage would simplify a lot of things. It would reduce complexity and make interoperability across tools far easier. But here’s the thing—most businesses don’t really care about the underlying tech; they care about results.
Sure, for many organizations, Iceberg will be a game-changer. But over the past few years, I’ve worked with several clients where it just wouldn’t have made sense. One client, in particular, wasn’t even comfortable using three or four tools in their data stack. Plenty of companies don’t want to spend their days stitching together five or more tools just to generate reports. They want a solution that manages their data, maybe some form of orchestrator/pipeline, a BI tool, and that’s it. (And, of course, some people still just want Excel).
Another director of engineering put it pretty plainly, “I like skiing on weekends.” Their point? They didn’t want to babysit a complex system designed by someone else that no one understands.
The truth is that people working in tech have different priorities. Some love tinkering with the latest tools, others just want to solve business problems and log off, while some want a bit of both. Layer that with varying company sizes and data budgets, and it's clear that Iceberg won’t be adopted everywhere. More likely, it’ll join the ranks of other powerful tools—widely adopted in some enterprises but not the default everywhere.
2) SQL Isn’t Going Anywhere
When I landed my first data job, the industry was all about building data lakes. The dream? A massive, unstructured data store where you could query anything and instantly gain insights. Ok, that’s a massive reduction in the actual structure and goal of a data lake. However…
In theory, there would be some order, yet many of these became known as data swamps.
At the same time, alternate query languages seemed to pop up everywhere, and you couldn’t go a week without seeing another “SQL is dead” hot take. Yet here we are, and SQL is still standing strong.
Why? Because data matters. Eventually, people rediscovered that applying some order to data made sense. Sure, data lakes still exist, but the whole “schema on read” approach quietly faded.
(Side note: This is why I’m optimistic about RoeAI’s approach to unstructured data—it balances flexibility with the need for structure.)
Overall, SQL remains the lingua franca of data, and I don’t see that changing anytime soon. If anything, it’s becoming even more embedded in how we work with data, as I’m seeing far too many massive 5,000-line queries that could use some serious refactoring—but that’s a topic for another day.
3) AI—From Press Release Driven Initiatives To Real Life
The data world continues to push for AI. I’ve had companies ask me directly where they can implement it in their workflows. What they’re usually referring to, though, are large language models (LLMs) or AI agents—a very narrow slice of what AI actually encompasses.
Still, the AI push isn’t slowing down anytime soon. Plenty of companies stand to profit, so expect the hype to keep rolling. But where is it actually headed?
Those Who Built a Solid Foundation Will Benefit - Plenty of companies have talked a big game about their AI adoption. But the question is—are they really implementing it? I’d bet that more than a few companies that have successfully implemented AI aren’t broadcasting it for attention—they’re just making it work behind the scenes.
I’ve talked with vendors, consultants, and enterprises applying AI for much more than the typical chatbot use case (which I keep running into as well). These teams are using it to process unstructured data and extract insights faster across industries like ecommerce, insurance, healthcare, and finance. On the flip side, I’d also add that one main blocker for many other companies I’ve talked to is the same problem we’ve had for at least a decade. Once someone has a model, no one knows how to deploy it.AI Use Cases Start Expanding - Building on that, I believe we’ll see even more creative AI use cases emerge in 2025—and likely really hit in 2027—many requiring stronger product sense rather than pure engineering skill. The real challenge isn’t what AI can do but figuring out how it fits into practical, real-world business needs.
If you’re considering how to integrate AI into your own workflows, start small. Review your current processes to identify repetitive tasks that could be automated. Then, speak with both leadership and individual contributors to surface the real pain points they face daily. AI works best when it’s solving meaningful problems—not just serving as a buzzword.
4) We’ll Continue Chasing the Same Holy Grails
Thanks to marketing, rebranding of terms, and a constant influx of new talent, the data world will keep chasing the same holy grails—often while ignoring the recurring issues that continue to plague it.
Lack of data governance
Lack of data modeling
Poor data quality
No matter how sophisticated the tools become, the same challenges persist. Companies continue to struggle with conflicting definitions for key metrics, poor data quality making its way into both operational and analytical systems, and an ongoing disconnect between business needs and technical implementations.
Honestly, this is where I could see AI playing a role in the future. Imagine a tool capable of bridging the communication gap between departments—translating technical data concepts into language everyone can understand, like a kind of Babel Fish for internal communication. But considering the nuances involved, both in language processing and contextual understanding, that kind of solution still feels pretty far off.
5) Vertical-Specific Data Solutions
Over the last decade, we’ve seen an explosion of new data tools and platforms. Yet, gaps remain: solving business-specific problems.
In 2025, I expect a rise in vertical-specific data platforms and solutions designed for industries like healthcare, financial services, retail, and manufacturing. This shift will likely be driven by the fact that the traditional data tooling space feels saturated. How many more ELT tools do we really need?
Instead of offering a blank canvas for companies to build upon, these new platforms will come equipped with pre-built models, pipelines, and analytics tailored to the unique challenges of a given vertical. Projects like Tuva Health are already working on healthcare-specific tooling that bundles both metrics and data infrastructure, and a few options are emerging in sales and marketing as well. But there are so many other options.
I’ve had conversations about similar concepts in the context of million-dollar projects and products. However, many of those offerings fall short when it comes to customization. What sets the newer wave of vertical platforms apart is their focus on giving end-users the ability to modify code and tailor the tools to their specific needs.
Closing Thoughts
One key takeaway about how change happens—not just in data but across the greater tech landscape—is that it rarely occurs all at once. Don’t get me wrong, there are moments of sudden innovation that change how we operate, and it’s hard to argue that LLMs haven’t been one of those.
But more often, trends evolve gradually. Many of the shifts we’re seeing now—like the rise of open table formats—have been in motion for a while. The Databricks acquisition of Tabular? That feels like a turning point, not the start of a trend. The same goes for the growing demand for simplified data solutions that don’t require six different tools just to manage a pipeline.
Trends come and go—and often take years to fully play out. So, as we look ahead, the next 5-10 years will be telling.
With that, thanks for reading!
Video Of The Week - What Is A Data Platform And Why You Should Build One
Join My Data Engineering And Data Science Discord
If you’re looking to talk more about data engineering, data science, breaking into your first job, and finding other like minded data specialists. Then you should join the Seattle Data Guy discord!
We are now over 8000 members!
Join My Technical Consultants Community
If you’re a data consultant or considering becoming one then you should join the Technical Freelancer Community! We have over 1500 members!
You’ll find plenty of free resources you can access to expedite your journey as a technical consultant as well as be able to talk to other consultants about questions you may have!
Articles Worth Reading
There are thousands of new articles posted daily all over the web! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
What’s The Difference Between A Data & AI Product Manager & A Digital Technical Product Manager?
By
We need a new breed of product manager to bridge the gap between data, analytics, machine learning, AI, business strategy, operations, customer needs, and a completely different type of product. TPMs understand software development and digital user experience. Data and AI products are a different beast.
TPMs focus on making sure things work as they’re coded. AI PMs must grapple with the inherent uncertainty of AI, where a product's reliability and ability to adapt are key.
Users interact with AI differently. They're not just clicking buttons and entering text; they're engaging in a conversation and collaborating with the AI to achieve a goal. That requires a whole new way of thinking about design.
Challenges You Will Face When Parsing PDFs With Python
Scraping data from PDFs is a right of passage if you work in data. Someone somewhere always needs help getting invoices parsed, contracts read through, or dozens of other use cases. Most of us will turn to Python and our trusty list of Python libraries and start plugging away.
Of course, there are many challenges you’ll face when scraping PDFs using any programming language. You’ll run into various data types and formats. There will be tables, images, text, and numbers. Not to mention, some PDFs are well constructed and easy to parse, while others are simply scans of actual contracts, which can be difficult to parse accurately, even with OCR. Now, in a prior article, we discussed how you can parse PDFs, and in this article, I wanted to discuss some of the challenges you’ll face when parsing PDFs. This ranges from the issues when developing custom pipelines to other challenges.
So, let’s talk about the challenges you’ll face when parsing PDFs with Python.
End Of Day 158
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, tech, and start-ups.
Looking to get more insights on vertical-specific data platforms and solutions from you!
Nice overview!
You wrote "Once someone has a model, no one knows how to deploy it." Are you talking here about ML? Or LLMs?
I thought this was a solved problem by now....