Building A Million Dollar Data Product
A Continuation Of Building A Million Dollar Data Analytics Service
My belief surrounding monetizing data is that the lowest value is selling raw data to other companies. Think credit card companies selling all of our purchase histories to everyone, from hospitals to advertisers. I say that, and of course, companies like Mastercard are likely doing just fine.
Truthfully some of this is biased provided my background working at companies that had some form of data product, but I do believe the companies that figure out how to take said data and put a layer in between it and those that want to access it can in turn create something that could be even more valuable.
However, it’s considerably harder to do (which is also why it is generally more valuable) because instead of simply throwing over the data and hoping the other company finds value in the data, you need to build a product.
This means you must actually understand how the data can be useful to your external partners, even if they are advertisers looking to pay for improved targeted marketing for healthcare providers wanting to ensure they are exceeding benchmarks for patient experience.
So in this article, I wanted to discuss some of the lessons I have learned on building a data product customers want to pay for.
Data Products Vary
Before digging into some tips on building a million dollar data product, I do want to clear out at least one point.
The term data product means a lot of different things to a lot of different people.
For example
A data product is a logical unit that contains all components to process domain data and provide data sets via output ports for analytical use. - Data Mesh Manager
or
A data product is a data-driven, end-to-end, human-in-the-loop decision support or idea generation solution that’s so valuable that customers would pay for it or exchange something of value to use it. - Designing For Analytics
I think of a data product more like the later quote, at least for this article. Below, I have also outlined five traits that the data products I have either worked on or used had that I believe made them rock solid(I am sure there are more).
So how do you make a data product that end-users are willing to pay for?
Stop Thinking Dashboard As The Only End Result
Dashboards are always an easy first step when it comes to building a service layer you can monetize. It makes sense.
First off, as data people, we are taught dashboards are the end state of all data (despite all the articles announcing their death). But if you’ve worked long enough in data, you also know that most dashboards quickly become ignored.
You can create far more lucrative data monetization platforms. For example, it's well understood at this point that Facebook is essentially a marketing apparatus.
Yes, it provides the ability to connect its users, but the way it makes money is by taking the data it has built on its users and helping advertisers better target and segment. So much so that I recall reading that in the NYT 2019 10-k risk factors they listed Facebook, Google, and Amazon.
Large digital platforms, such as Facebook, Google, and Amazon, which have greater audience reach, audience data, and targeting capabilities than we do, command a large share of the digital display advertising market, and we anticipate this will continue. - NYT 2019 10k - ITEM 1A. RISK FACTORS
But that’s just one example where the benefit of data is not a dashboard but instead an actual product.
You could say similar things about Google, but instead of community, they helped aggregate data on the web and put a simple search bar in front of it. It’s, of course, easier said then done and that’s why dashboards continue to be a popular choice.
However, you could also centralize data, whether it be structured or unstructured, into an application(or dashboard) that provides value by giving the end-user either all their information in one place or perhaps creating industry benchmarks.
Become The Center And Industry Standard
One of the values that many of the companies who have created data products have is they help centralize data.
This is actually illustrated in several different methods.
Centralize scattered data - I have worked with a client who focused on scraping multiple government sites and centralized the data into their product. So now instead of having to visit multiple sites and scroll through thousands of pages, they centralized the content into one site where they also organized and improved searching and tracking. This is actually more of a product, unlike the next example which fits the traditional dashboard and insights mentality of data.
Centralize industry data - This option could either be by your company offering a SaaS product that lets you provide industry-wide insights (assuming your customers agree to their data being used for said use case), or you could become the industry central reporting system. There are several companies such as HDMS and ReelMetrics which focus on doing just that.
In these cases, the value isn’t just the insights, but the fact that you’ve curated the data itself. In terms of the second point, you can tell how well a customer is doing compared to its competitors (based on the metrics you track). That information can be invaluable.
Going back to the first point, having only the data you need in one place and being able to trace changes saves time and allows customers to make more informed decisions without being overwhelmed.
Understanding Real Life - Define What Action Can Be Taken
If you plan to monetize your data, it's not enough to just put up interesting data or metrics.
You’re building a product and, as such, you need to understand how the user will use your product to take real-life actions.
For example, in the past, I worked on a project where we developed fraud detection among healthcare providers. Think of a doctor upcoding a normal visit to an emergency visit or perhaps the questionable use of unnecessary screenings and X-rays to pad the bottom line.
The question here becomes, “What is useful for the customer to know and what will they do with that information?”
They’ll likely want to know which types of procedures are commonly misused as well as other patterns and trends. At the same time, you’ll likely only want to surface the most critical issues or worst offenders so that a human can then go and double-check the results(which is also why traceability is important).
I add that last part because things can always be implemented poorly and lead to negative outcomes such as with United Healthcare, which is why I do believe we should always keep a human in the loop because many models won’t understand nuances and/or might be too inflexible.
At the end of the day, it goes beyond just data, there are human lives at the end of many of these products.
Beyond Data Quality
In many of these cases, whether you’re building a dashboard or more of a full-blown product, building in data quality is a must.
But I don’t just mean your standard data quality tests; those tests are generally good at detecting anomalies, missing data, etc.
There are other tests that should be written that likely feel more like ones you’d write for more traditional functions. For example, unit testing for a set of data. These tests come in when you start doing complex transforms because it's so easy to do something in SQL or in another data transform layer that can, in turn, make the data inaccurate.
The usual example I provide most companies is aggregate checks. These can ensure that if you start with a million dollars in your data set, you end with it (or if you filtered it, the filtered-out data set still matches expectations). There are a few ways you can implement this.
Have another analyst or engineer try to write the same query, and see if you get the same output.
Write small unit tests that check specific columns to ensure the output is as expected.
This will ensure that a simple mistake like rounding too early or perhaps filtering out data you shouldn’t is caught and fixed early on.
Because, if you are a data company, being known for bad data is like being known for using cheap materials. It also can quickly turn paying customers into churned customers.
Now The Hard Part
Building a data product that companies want to pay for is far from easy. You need to build a solution that ingests and standardizes data as well as maintains high levels of data quality. All of this is a pretense to the next step, building a product that people want to pay for that is profitable.
This is likely why plenty of early conversations I have with companies wishing to monetize their data start with them wondering how they could share data.
Perhaps this is also why Snowflake is first on Google for both data sharing and data monetization in terms of paying for sponsored placement.
They might know the natural progression starts with data monetization and leads to data sharing(especially when you can’t figure out how to build a data product).
If you’d like to learn more about monetizing data, I wrote an article last year on a similar topic, although it focused more on dashboards. I’d also love to chat!
Thanks for reading.
RTA Summit Admission Giveaway
Real-Time Analytics Summit in San Jose, May 7-9
This year DeltaStream is excited to be sponsoring the 2024 Real-Time Analytics Summit.
To celebrate this year’s event, we are giving away a chance to receive one general admission to the summit in San Jose. Join real-time data pros to connect and learn what’s new in the world of real-time data. Already attending? Stop by our booth and discover the power of our complete real-time stream processing platform powered by Apache Flink
Join My Data Engineering And Data Science Discord
If you’re looking to talk more about data engineering, data science, breaking into your first job, and finding other like minded data specialists. Then you should join the Seattle Data Guy discord! We are close to passing 6000 members!
Join My Data Consultants Community
If you’re a data consultant or considering becoming one then you should join the Technical Freelancer Community! I recently opened up a few sections to non-paying members so you can learn more about how to land clients, different types of projects you can run, and more!
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
DragonCrawl: Generative AI for High-Quality Mobile Testing
The Developer Platform team at Uber is consistently developing new and innovative ideas to enhance the developer’s experience and strengthen the quality of our apps. Quality and testing go hand in hand, and in 2023 we took on a new and exciting challenge to change how we test our mobile applications, with a focus on machine learning (ML). Specifically, we are training models to test our applications just like real humans would.
Mobile testing remains an unresolved challenge, especially at our scale, encompassing thousands of developers and over 3,000 simultaneous experiments. Manual testing is usually carried out, but with high overhead, it cannot be done extensively for every minor code alteration. While test scripts can offer better scalability, they are also not immune to frequent disruptions caused by minor updates, such as new pop-ups and changes in buttons. All of these changes, no matter how minor, require recurring manual updates to the test scripts. Consequently, engineers working on this invest 30–40% of their time on maintenance. Furthermore, the substantial maintenance costs of these tests significantly hinder their adaptability and reusability across diverse cities and languages (imagine having to hire manual testers or mobile engineers for the 50+ languages that we operate in!), which makes it really difficult for us to efficiently scale testing and ensure Uber operates with high quality globally.
Your Favorite BI Tool - Issue 198
By
A few years ago, I published Ditch Tableau For God’s Sake. It’s 2021. Afterward, I received many questions about Tableau alternatives - if Tableau is so bad, then what BI tool is a good alternative?
I didn’t have an answer ready, as my initially favorite BI tool was sold to Atlassian and subsequently shut down, and then my next favorite tool was sold to Sisense and transformed into something else.
I used Tableau at MyFitnessPal and hadn't explored other dashboards until last week, when I began testing other options that the BI market has to offer.
Honestly, I was blown away.
End Of Day 124
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, tech, and start-ups.
Out of grad school I was working for as startup focused on genetic testing (https://www.concertgenetics.com/) We built custom data pipelines to scrape hundreds of thousands of genetic tests from laboratory websites (and PDFs!) and did the legwork to standardize everything so that tests and their prices could be compared across companies. Then that data was used to help insurance companies fight fraudulent claims and the AMA to define better codes for said insurance companies. That clean, curated, and accurate data set powered a *ton* of applications and it's something I'm still incredibly proud of. At the time I didn't have the "Data Product" framing, but looking back it absolutely was.
Awesome Sharing it's very useful. Thanks.