Holy Grails of Data: Self-Service, Single Truths, and the Role of AI

Dec 27, 2024

∙ Paid

For decades, data teams have been chasing multiple ever-elusive Holy Grails. These end goals get talked about heavily, but it feels as if very few data teams have indeed seen them.

And if they have, it was always at some other job.

The data world has many of them. Some are real, some aren't, and many are used by marketing and sales teams to sell the dream. In fact, when I first came into the data world, Tableau was selling self-service analytics hard. Of course, being new to the world, it makes sense; if end-users can access their data, they'll ask you fewer questions, right?

Well…sort of.

But let's not get too deep into the actual meat of this article before barely even starting. In this article, I outline some of the key holy grails we are constantly chasing in the data world, what's often held them back, how some companies have succeeded, and where AI will likely play a role in some of these as well.

1) Enabling Self-Service Analytics

Self-service analytics was the first holy grail term I was exposed to. Many who started their data journey around 2013-2015 also likely came up against Tableau marketing, which heavily touted the self-service moniker.

In theory, it sounds really nice. Just create some dashboards, make sure they are easy to filter, and boom, "Self-Service Analytics."

Now, this brings us to our first problem. You might realize that I haven't even defined self-service analytics. It was a marketing term, but it became, like many other terms, kind of opaque. It meant so much and yet so little.

Well, let's take two responses from a Reddit thread titled "What The Hell Is Self-Service Analytics?"

Generally, the notion is that an end-user won't have to reach out to the BI team. That's the goal. The reality, in particular prior to LLMs, was that:

Some users would be okay with digging into dashboards, but others would get frustrated because of one slightly technical issue and give up.
Many users just wanted the data in Excel so they could analyze the data themselves.
Other users would realize they’d also want to double-check the data to ensure its accuracy, which would require going to the data warehouse or source system, which would be out of their depth.
Still, others would have dozens more questions now, and in the end, you'd have more work, not less.

In many ways, all the marketing for the prior form of self-service analytics has built the foundation for the current wave of new tooling.

“Oh, did you not get self-service analytics with Tableau or PowerBI? Well, we are different!”

Even though I am still finding non-data teams struggling to use the new solutions.

With LLMs, there have been some improvements. If you've got your data well organized and your question isn’t complicated, there is a good chance an LLM will work.

But I am also seeing plenty of limitations currently, such as when I ran some tests with Josue Bogran, who ran into an issue using Databricks Genie, and, had he not been a technical user, he wouldn't have been able to properly adjust his request to get the right answer.

I'll give some caveats: since this was more for experimentation, the agent could have been set up better, or perhaps some configuration could have avoided this issue. This was also 5 months ago, so Databricks may have made further improvements. But if I am being honest, that's what it's always felt like.

Self-service analytics is always almost there but never there for most companies.

Give it another few years, and I know we’ll be shocked at how far many of these tools will go. Solutions like Databricks, Snowflake, and BigQuery are all well-positioned to be the ones that build these self-service tools because of the sheer amount of query data, amongst other things.

I can see a world where many of the problems we are running into today will be solved by a diligent few who won’t be dismayed by the current limitations. But even there, I don’t see a world without a need for a technical layer of specialists who can dig deeper(but I’ll write more about that in a future article). Otherwise, it’s hard not to imagine our world ending up like that one scene from Idiocracy, in which no one really knows how all their fancy tooling works.

For now, to avoid some of the current issues with self-service analytics, your data team will likely need to:

Clearly define what you're considering self-service analytics.
Have well-set-up semantics; if not in a semantic layer, then you’ll need to ensure that your company is using a similar set of definitions for key business terms.
Set up training with your business users to fill the technical gaps for the tools you want them to use
You should still expect some level of ad hoc requests to come your way.

I have seen some companies have forms of self-service analytics, but I will say that was often the case when their business teams were also well versed in the technologies they were using to interact with the data.

2) Becoming Data-Driven

Another common holy grail is the quest for a company to become data-driven. This is likely the most attainable holy grail. Again, if you clearly define what it means.

So many companies want to be data-driven, and so few really are. There are multiple reasons for this.

They don't have control of their data or only have access to small pieces of data.
Even if they do have a decent amount of data that is easy to use, the company just prefers going with its gut.
Maybe they have data in an accessible form, but the business or the data teams fail to communicate.
Last but not least, you have business people who are data-savvy, so much so they can make the data say whatever they want.

The truth is, often, one of the main issues is people. People get in the way of being data-driven. Either their own ideologies or lack of trust in the data.

In all fairness, the mistrust might not be unfounded.

There is the story Jeff Bezos shares an example where Amazon had metrics that showed their customers were waiting less than 60 seconds when they called a 1-800 number to get phone customer service. However, in a meeting when Bezos tested this, they waited for over 10 minutes. Leading to him coming up with the saying that:

when the data and the anecdotes disagree, the anecdotes are usually right. And it doesn’t mean you just slavishly go follow the anecdotes then. It means you go examine the data because it’s usually not that the data is being miscollected, it’s usually that you’re not measuring the right thing. - Jeff Bezos

So, even when you’re using data to drive a decision, it can still lead you astray. There is a need to balance 100% trusting data and working to get a more human understanding of a situation. Again, there is some need for the business side to be data literate, or the data team needs to be able to make the data and insights very easy for the business teams to understand. It generally falls on the data team to make the data and insights digestible versus the business side, which needs to understand the data.

In many ways, you could argue that the business becoming data literate is another holy grail. But let's jump to the next one.

If you’d like to help support this newsletter and get access to all my newsletters and future paid articles.

With that you’ll also get access to:

All past and future newsletters

Private AMA with me for paying subscribers only

Future webinars discussing data leadership, strategy and infrastructure topics

Support The SDG Newsletter Today

Keep reading with a 7-day free trial

Subscribe to SeattleDataGuy’s Newsletter to keep reading this post and get 7 days of free access to the full post archives.