Photo by Maarten van den Heuvel on Unsplash
There are plenty of cliches about data and its likeness to oil or companies being data-driven. Is there truth to all this hype about data strategy, predictive modeling, data visualization, and machine learning?
In our experience, these cliches are true. In the past few years, we have already helped several small and medium-sized businesses take their data and develop new products, gain invaluable insights and create new opportunities for their businesses that they didn’t have before.
Many small and medium-sized businesses are starting to take advantage of the ease of access to cloud computing technologies such as AWS that allow your teams to perform data analysis easier and anywhere using the same technology billion-dollar corporations use at a fraction of the cost.
So what are you doing to improve your business data strategy today?
To help answer this question our team has put together a data strategy assessment that will help highlight where your team is doing well and where it can improve on its data strategy.
Goals Of This Data Strategy Questionnaire:
The goal of this questionnaire is to help outline what your company is doing well as far as data strategy and data analysis are concerned as well as point out some places your company could improve.
So don’t worry so much if you don’t know every answer, also, if you want us to walk you through this, we would be happy to set up a free consultation to talk through some of these questions or some of your answers.
Define What You Actually Want To Do With Data
Before diving deep into technologies, infrastructure and reporting, it’s better to define what your company is trying to do.
Is your CMO looking to improve your RevOps.
Is a director trying to gain a better understanding of your unit economics.
More importantly, what will they do with that data!
1. What are the current business problems you are trying to address?
Data strategies need to be aligned with business strategies. Otherwise what ends up happening is that perhaps you discover some interesting findings.
You present those findings to leadership and although intrigued they have no available resources to follow-up. Thus, your findings, your insights, your hard work dies right there.
It will be forgotten and wasted.
So, make sure you are aligned with the business and you have allocated business parters who will act on insights.
2. What is the core business strategy for the next 2-3 years?
Once you are aligned with the business, you should write down clear goals for your data strategy. This could be:
Reduce costs in key areas such as reducing cloud costs
Increase customer conversion in specific key segments
Improve LTV of customers
Research which features to develop or what markets to go into
Etc
Having a few target goals such as the ones listed above can help guide your data analytics team to making clear decisions on which areas they should be researching or creating reports around.
Data Sources And Systems
3. Do you have a general understanding of how to access your company's data?
Sometimes, you might not even realize that you can access data from the third-parties you use. Many modern third-party systems have APIs or data extracts that can be automated. For example, Salesforce, Workday, and Quickbooks all have APIs that allow you to pull data easily and automatically. So this allows you to automate reporting in the future.
4. Do you have multiple systems that contain your data like QuickBooks, Salesforce, Workday, or similar third-parties that you can either access (e.g. APIs, FTPs, databases)
There are thousands of third-party systems and it is hard to list them all, but you are probably most interested in is the systems where your companies customer transactions and interactions are stored. This doesn’t just mean purchase history. This could also be pages your clients visited on your site, site usage in general, customer usage of your physical goods, emails, etc.
Data Processing, Storage, And Analytics
5. Do you currently analyze any of your data? If yes, what tools do you do and how often do you look at them?
Many companies already use tools like Excel to analyze data. This is a great tool and might even be enough for a small and medium-sized company. We would love to know how you are looking into your data to make decisions.
Also, we would break down data into three major categories, financial, operational, and customer data. So our next question is
6. Do you mostly analyze financial data or do you also look into operational and customer data?
Many companies look into financial data because it is usually the easiest to access and understand. It is also the core of most decision making.
However, it usually only shows the output of actions and not the actual actions that caused them. So perhaps you have an increase in sales, but in which product line, in what area and what demographic, and why?
This would be a combination of financial and customer data. Being able to mesh this data together means you need to be able to access both data sets together. This is often a pain point for many companies as this data is siloed away from each other.
That’s where developing a data strategy can help centralize your data so you can access it all together.
7. Do you store and or process your data in a central location from all your various systems so you can easily meld it all together?
As we brought up in the previous explanation. Having a centralized data storage system like a data warehouse can be crucial to your companies data strategy. A data warehouse is a data storage system that allows you to easily meld data together and quickly perform ad-hoc analysis without needing to pull ten excel extracts together every time.
8. Who is responsible for your data quality, Or do you have anyone responsible for data quality
This is kind of a trick question. Even large billion-dollar corporations struggle with data quality. Many times it’s because there isn’t any form of data quality guidelines set in place. This is a key part of a data strategy as it ensures that the data-driven decision making is accurate. Otherwise, you might as well be guessing.
9. Do you look at your data at a regular cadence or is it sporadic
Here, the answer is probably best if it is a little of both. Having a regular cadence of looking at specific metrics or sales numbers is a good plan because oftentimes you might need to adjust to the ever-fluctuating market. However, you also probably want to answer ad-hoc questions at the moment. But it is much harder to answer sporadic questions if your data is in 5 different locations(going back to the data warehouse discussion).
10. Has your team created and standard metrics or key performance indicators(KPIs)
Creating metrics and KPIs is a skill all on its own. It requires an understanding of your business driving factors as well as distilling them down to a few key points. If a KPI or metric is too complicated, then it is hard for you as a business owner to know what actions to take.
For example, if your metric is highly abstract and requires several layers of math to get to some random percentage or customer score, but you don't understand what that number means in the end. How can you act?
Instead, in general having a clean metric where perhaps you just divide population A overpopulation A + B or something where it a simple weighted average of some kind that is very clean and concise is far easier to understand that some complex abstract data science output (now there are places to put data science outputs, just probably not in a KPI).
11. Are there questions your team currently can’t answer about your business that it wishes it could?
Many times we have found that businesses struggle to answer even simple questions like, which customers visit what features most often on a site or what days do we utilize our equipment the most. Being able to answer these questions can help lead to strategic decisions that can help increase revenue or reduce costs.
12. Most importantly, do you have a data analytics process?
When it comes to data analytics, having some sort of process or strategy as far as how you approach answering questions is key. We like to say that data analytics can become an infinite task. One question leads to another and another and another and before you know it, you're not even sure what you are trying to answer.
This is why having some light structure to your analysis can help keep you on track.
Data Visualization And Communication
13. What communication methods do you have to share your data
At the end of the day, your data analytics strategy is not complete if you can communicate your findings to the rest of your team. Being able to concisely state your insights is key. This can be done through a few key metrics, charts, and graphs.
14. Does your team use any form of data visualization?
There are lots of great fancy tools out there like Tableau and Microsoft Power BI that can help you show off your data. They aren't always necessary but they can be great tools that have beautiful UIs that can help captivate your audience. Captivating your audience is key in this design-driven culture we live in.
People no longer accept charts and graphs that aren't thoughtfully designed and impactful. This means you can't just spend money on a data visualization tool and hope it covers up your ill-thought-out design. You need to take a moment to think about how your team will translate the results.
Data Analytical Talent
15. Do you have a data analyst, data scientist, or data engineer on the payroll? If yes, do you have an onboarding document?
This section only has one question. This is because having a data person on staff already says a lot.
It says a lot about your companies goal to be data-oriented. Having full-time data anything means you must be asking for reporting, analytics, and metrics regularly. Regardless of the skill-set of the individual.
The follow-up question is important because it plays more into your companies overall data strategy. Especially for smaller companies where there might only be one data analyst.
Having an onboarding document ensures that you are ready when a data analyst leaves. This onboarding document will have information about data sources, data pipelines, metrics, etc. This should act as the bible for the data team.
Data Science And Machine Learning
16. Does your team utilize modern predictive modeling, analytics, or machine learning?
There are lots of ways these days your team can utilize machine learning, data science, and predictive modeling. For example, AWS offers various APIs to utilize their vision deep learning networks as well as sage maker. Now, these aren't necessarily what you will need, but they are great examples of tools you can use.
17. Do you think it could benefit from it?
An important question to ask in your data strategy when it comes to machine learning and all the other fancy bells and whistles is: Will utilizing machine learning have a better ROI compared to some simpler data analytics technique. Although there are plenty of data techniques and practices that are not as sexy as machine learning, they can still be much more impactful and considerably cheaper.
We ask this question because often companies still haven't gotten to a point where they even have easy access to their data and yet they want to create machine learning models. There is a process in your data strategy and you do need to crawl before you can run.
Cloud Computing
18. Do you utilize any form of cloud technology to help keep your team mobile
Another key point in your data strategy is the cloud. The cloud is a major reason why many small and medium businesses can consider a data strategy. So our first question usually do you use the cloud like AWS for anything. It allows your team does be on the move and remote while still being able to perform the work required.
19. Do you utilize cloud technology to take advantage of scalable compute
The reason the cloud has played a key role in companies' data strategy is because of scalable computing. You no longer need to buy a new server when you want to store all your data. You now can buy a small chunk of space on AWS or Azure and store your data and access it anywhere with your Tableau or Power BI workbook.
This flexibility has reduced costs without reducing the power that servers and corporate technologies provide.
Follow-Up
20. Whats you’re plan to determine if you’re data strategy is successful and provided value?
You can implement all of the above. You can implement a data storage layer.
You can put together some pretty dashboards.
You can have amazing machine learning models.
However, if you don’t have a plan for how to determine which parts of your process were successful, then how can you actually claim your project provided value. If all your team is doing is building infrastructure for infrastructure sake, then you might need to revaluate the entire project.
Tips And Recommendations For Your Data Strategy
Data Sources And Systems
If you felt uncertain about your answers here, then we recommend looking into what systems you use every day that could store customer, financial or operational data. If you have a system that tracks information on those categories, then there may be an API that allows you to extract that data. Thus, providing you access to your data.
Data Processing, Storing, And Analytics (Data Lifecycle)
Once you know what your data sources are, then you can take on your data's life cycle. If you have never thought about how your data's life cycle, then take a moment to map it out. Where does it come from, does it end up in Excel sheets, do you use version control or is everything kind of a mess? Also, feel free to reach out and we would be happy to spend 30 minutes helping you map it out.
Data Visualization and Communication
With a solid understanding of your data's life cycle, you can now look into working on communicating what that data says. This is where data visualization comes into play. If you want to take this on yourself, then we recommend you read this article 10 Rules For Better Dashboard Design. Even if you aren't building a dashboard, it will help you pick better charts and have a more design-driven approach.
Machine Learning And Data Science
Alright, now you have a solid base data strategy. Now it is time to ramp up your data usage. If you feel shaky on data science and machine learning then we do recommend hiring a professional like our team or someone else. There are just so many pitfalls and challenges you will face along the way and we feel a business owner's time won't be best spent here.
Cloud Computing
If you aren't utilizing the cloud and have lots of data you want to start storing somewhere to answer questions and or create dashboards to help drive decisions, then we recommend you look into Azure, AWS, and GCP. These cloud providers allow you to reduce your data storage costs while utilizing the same servers and technologies that billion-dollar businesses do at a scaled-down price.
How To Plan Your Next Steps?
So you have finished the questionnaire and now you are wondering where to go next? We would love to help you along your data journey. Our team is made up of data experts who have experience building data storage systems, APIs, dashboards, and so much more. We can help design, develop, and maintain your data systems without the need of hiring a full-time data engineer and data scientist.
Do reach out if you have any questions.
The Big Data Game
One 8-bit Super Mario look-alike data engineer is out on a mission.
A mission to get the DATA!
Be prepared for more pipelines.
More clouds.
More lakes.
And much more data.
Very special thank you Firebolt for sponsoring the newsletter this week!
Join My Data Engineering And Data Science Discord
Recently my Youtube channel went from 1.8k to 27k and my email newsletter has grown from 2k to well over 7k.
Hopefully we can see even more growth this year. But, until then, I have finally put together a discord server. Currently, this is mostly a soft opening.
I want to see what people end up using this server for. Based on how it is used will in turn play a role in what channels, categories and support are created in the future.
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
Optimizing Pinterest’s Data Ingestion Stack: Findings and Learnings
At Pinterest, the Logging Platform team maintains the backbone of data ingestion infrastructure that ingests terabytes of data per day. When building the services powering these pipelines, it is extremely important that we build efficient systems considering how widespread and deep in the stack the systems are. Along our journey of continuous improvement, we’ve figured out basic but useful patterns and learnings that could be applied in general — and hopefully for you as well.
Presto® on Apache Kafka® At Uber Scale
Uber’s goal is to ignite opportunity by setting the world in motion, and big data is a very important part of that. Presto® and Apache Kafka® play critical roles in Uber’s big data stack. Presto is the de facto standard for query federation that has been used for interactive queries, near-real-time data analysis, and large-scale data analysis. Kafka is the backbone for data streaming that supports many use cases such as pub/sub, streaming processing, etc. In the following article we will discuss how we have connected these two important services together to enable a lightweight, interactive SQL query directly over Kafka via Presto at Uber scale.
Real-time analytics with Amazon Redshift streaming ingestion
Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL. Amazon Redshift offers up to three times better price performance than any other cloud data warehouse. Tens of thousands of customers use Amazon Redshift to process exabytes of data per day and power analytics workloads such as high-performance business intelligence (BI) reporting, dashboarding applications, data exploration, and real-time analytics.
We’re excited to launch Amazon Redshift streaming ingestion for Amazon Kinesis Data Streams, which enables you to ingest data directly from the Kinesis data stream without having to stage the data in Amazon Simple Storage Service (Amazon S3). Streaming ingestion allows you to achieve low latency in the order of seconds while ingesting hundreds of megabytes of data into your Amazon Redshift cluster.
End Of Day 42
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, the modern data stack, tech, and start-ups.
If you want to learn more, then sign up today. Feel free to sign up for no cost to keep getting these newsletters.
Good stuff!