How To Improve Your Data Analytics Strategy For 2022 - Reducing Costs And Improving Insights
Reviewing Our Consulting Project Trends From 2021 Community Update #26
2022 is around the corner and it is time to start looking towards improving your data strategy.
Our team has seen several trends in 2021 in terms of methods that can help improve your data analytics strategy.
Whether it be optimizing your Snowflake table structures to save $20,000 a year or optimizing your pipelines to reduce load times for dashboards by 30-50x. Our team has had the opportunity to improve companies of all sizes' data analytics strategies and infrastructure.
Data analytics and insights are more than a buzzword.
Data analytics is driving companies.
Start-ups, billion-dollar fortune 500 companies, and single-owner businesses are using data to drive their business.
In turn, their data infrastructure and data analytics strategy often need to be re-examined and improved constantly as they mature. There is clear process companies take as they go along their data analytics journey. From utilizing software engineers to build data pipelines to setting up data engineering teams. Companies are constantly growing their data analytics expertise.
In this article, we will review some of these cases where our team has helped companies improve their data infrastructure and strategy. Leading to decreased cost, better business alignment, and increased revenue sources.
No More Custom Code For Data Pipelines (Unless It's 100% Necessary)
Using custom code solutions for data pipelines used to be a standard for many companies. Using tools like CRON, SQL, and Python, developers would automate pipelines and manage their complex networks of tasks either by simply guessing the timing of a task or creating a meta-database.
Truthfully these were often just recreations of Airflow. In turn, they required heavy amounts of maintenance and could lead to failures in the long-term due to a lack of staffing or key players leaving.
Now, in 2021, data engineers and Directors of Technology have a lot of options when it comes to data pipelines. We no longer need to develop custom code solutions and manage everything from DevOps to Web UIs. Instead, we can take advantage of the various data pipeline solutions like Astronomer.io, MWAA and Fivetran to improve our data strategy.
In 2021 our team has actually migrated several large organizations away from their unmaintainable data pipelines onto Managed Workflows for Apache Airflow, Fivetran, Astronomer.io, and Stitch. All of which have helped reduce the time it takes to deliver new data pipelines.
Not to mention improve the maintenance costs.
We have helped optimize several companies' data analytics strategies by removing one of the big choke points. Getting access to the data. Instead of creating custom data connectors, we moved companies to managed services.
This helped reduce the overall cost it took to pull data from all forms of data sources. Whether it be Salesforce, BambooHR, Shopify, or Jira.
In the end, switching to some form of managed data pipeline service can help improve your data strategy.
Lower Costs And Improve Performance With Better Data Design
Snowflake and other data platform solutions are proving to be game-changers. Our team has seen many companies, even without data engineers, spin up data warehouses and start pulling in sources using Fivetran and other low-code solutions.
However, there are often a lot of decisions that can occur along the way when using low-code solutions that can become costly.
These include only setting up a single layer of data that is billions of rows and slow. Especially when then trying to connect it directly to a dashboard. Honestly, we have seen this happen with Tableau, Looker, Power BI, and every other dashboard in between. Despite what many dashboard salespeople might say, often times you will run into issues when trying to process too much data into their software.
Billions of rows of data are slow (not to mention expensive to process). We have seen 5 minute load times and even 15 minute load times. All of which we have turned into 3-20 second load times. All by changing how the data is structured
Instead of billions of rows, it is better to create analytical layers, depending on the use cases that ensure that data can be sent to the right users quickly. Not only can this make your queries faster, but it can also reduce costs.
This year alone our team has been part of 3-4 projects that have helped reduce companies' SaaS costs by $5,000-$20,000 a year because of changes in their data warehouses and data pipelines. On top of the improved speed.
Overall, using low-code solutions is great for your data analytics strategy. However, you need to consider how you set up your data.
Setting Up Data Quality Checks To Avoid Sending Bad Data
Data quality remains a major problem across companies. This is because data quality means a lot more than just correct data. It also refers to timely data as well as data coming from a source of truth. For example, one of the constant issues we hear is that the data is right but not synced at the right time. Meaning that managers see that their operational data is different than their dashboard data(sometimes this is unavoidable, but if it's not explainable, then it will always come off as incorrect).
But let's talk about one of the worst problems we have seen caused by a poor data strategy in 2021. We have come into situations where companies have been sending bad data reports to clients. 100% inaccurate data, to clients. As you can imagine, this came off very poorly when the clients noticed immediately.
Truthfully, fixing this problem is not as hard as it sounds.
Traditionally, this means creating data checks. This often meant developing a lot of infrastructures that would take months to build and several engineers' time. However, nowadays we are working with partners like BigEye.
These tools allow developers to quickly implement data checks over tables that do so much more than standard data checks. They will often track data quality trends, auto-metrics, and auto-thresholds.
All of which can help companies improve their data analytics strategy without expensive engineering. While at the same time helping companies ensure that the reports and metrics they send out to cross-functional partners and external clients. Meaning that you build trust and ensure quality output.
This is the goal. As someone that has worked at companies who live and die by their analytics, having quality data is key.
You can't be wrong or people stop paying. So that is why we drive companies towards integrating their data quality throughout their data process. That's how you ensure you improve trust.
In the end, data analytics is so much more than fancy algorithms and dashboards. It's about quality data.
Stop Using Software Engineering Teams To Do Data Work
Another common trend we ran in 2021 was software engineering teams being forced to manage data engineering work. The problem here is not that the software engineers don't have the skills, instead, they often don't have the time.
The constant context switching to working on data work was often slowing down other work or stopping data work altogether. In many of these cases, one of the main reasons our team was called in was in order to help alleviate or restructure the data workflows so they no longer required software engineering intervention.
When companies are just starting out, it makes sense to rely on software engineers. However, as your company matures, in turn, your data engineering strategy needs to as well. That's where our team came in. We helped re-design and re-vamp data systems to reduce the amount of engineering attention required. In fact, in many examples, we helped reduce the costs and maintenance by picking much simpler solutions for data infrastructure.
This is because there are a lot of managed services that can help data teams focus more of their attention on complex business logic, instead of redundant data connectors. Of course, this does depend on how large your data team is, but overall, data engineers aren't a cheap investment, and having large teams of them isn't always feasible.
Thus, your company needs to make the right choice of what tools you utilize in your data engineering strategy. It can sound cool to use to newest open-source technology, but it can be just as expensive when you add in the FTE costs.
Implement A Clear Data Strategy
One of the biggest issues we saw this year was a lack of a clear goal or data analytics strategy.
Our team can come in and implement the latest tools.
We can develop dashboards and automate data pipelines. However, if your future data teams don't have to buy in from stakeholders or align with the business, there will be problems. This doesn't mean everyone at your company needs to become data-driven. It does mean that the data work being done by data engineers, data analysts and data scientists need to be driving a part of a strategy.
Otherwise, why do it?
Utilizing data purely to create vanity metrics or interesting insights that aren't acted on doesn't provide value and distracts the company.
So one of the key goals we had when we started in many of our projects was to make sure we align the business and their data analytics strategy. That way, as we made changes they would remain.
Stakeholder buy-in will always be key. Regardless of the technology project, you are taking on.
So how do you drive your data analytics strategy?
Your Data Strategy Next Steps
Data is continuing to prove to be a valuable asset for businesses of all sizes.
Similarly, we have been able to consult for several clients and help them find new revenue sources as well as cost reduction opportunities.
There is one catch.
You will need to develop some form of data infrastructure or update your current one to make sure you can fully harness all the benefits that the modern data world has to offer.
But you need to set-up your next steps for your data strategy, and we want to help.
Do You Need To Modernize Your Data Analytics Architecture?
I will spend more time diving into some of the other questions in the future as well as in my data analytics strategy guide I will be putting out.
But, if you need to ask some questions about your data infrastructure today, or you want to know how you could use your data to help increase your revenue or reduce your costs, then feel free to set up some time with me.
Announcing O'Reilly Data Quality Fundamentals Book. Now Available: Exclusive access to the first two chapters!
Thrilled to announce the release of O'Reilly first-ever book on data quality, Data Quality Fundamentals: A Practitioner's Guide to Building More Trustworthy Data Pipelines! In this book, creator's of the Data Observability category, make the business case for data trust and explain how data leaders can tackle data quality at scale by leveraging best practices and technologies used by some of the world's most innovative companies.
Special thanks this week to Monte Carlo! Monte Carlo’s Data Observability platform helps your team increase in trust in data by eliminating data downtime.
How? Monte Carlo uses machine learning to infer and learn what your data looks like, proactively identify data downtime, assess its impact, and notify those who need to know.
Video Of The Week - 5 In Demand Data Jobs And How Much They Make - From Data Analyst To Machine Learning Engineer
How much do machine learning engineers make?
How about data engineers?
More importantly what do jobs like data analysts and data scientists do.
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner
The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions (e.g., creating an offer to match a trip with a driver) to periodic location updates (e.g., recalculating eligible products for a driver when their location changes). The platform handles millions of concurrent users and billions of trips per month across over ten thousand cities and billions of database transactions a day.
In the previous article, we introduced the Fulfillment domain, highlighted challenges in the previous architecture, and outlined the new architecture.
Democratizing the Data Stack—Airflow for Business Workflows
As data has become more relevant to all parts of a business, SQL has risen as a universal access layer that allows anyone to get the data they need to do their job. This is a democratizing force as data is no longer stuck in the hands of an engineer — it can be used by anyone in the company to answer questions about the business. Today, using Reverse ETL tools like Hightouch and modern orchestrators like Apache Airflow takes the power of SQL to the next level.
Netflix Video Quality at Scale with Cosmos Microservices
Measuring video quality at scale is an essential component of the Netflix streaming pipeline. Perceptual quality measurements are used to drive video encoding optimizations, perform video codec comparisons, carry out A/B testing and optimize streaming QoE decisions to mention a few. In particular, the VMAF metric lies at the core of improving the Netflix member’s streaming video quality. It has become a de facto standard for perceptual quality measurements within Netflix and, thanks to its open-source nature, throughout the video industry.
As VMAF evolves and is integrated with more encoding and streaming workflows within Netflix, we need scalable ways of fostering video quality innovations. For example, when we design a new version of VMAF, we need to effectively roll it out throughout the entire Netflix catalog of movies and TV shows. This article explains how we designed microservices and workflows on top of the Cosmos platform to bolster such video quality innovations.
End Of Day 26
Thanks for checking out our community. We put out 4 Newsletters a week discussing data, tech, and start-ups.
If you want to learn more, then sign up today. Feel free to sign up for no cost to keep getting these newsletters.