How Are Companies Taking Advantage Of Their Data Without Increasing Their Data Teams Size
Looking Over Tools Like BigQuery, Airbyte and Census
Photo by Myriam Jessier on Unsplash
The modern data stack (MDS) is a new approach to data integration capable of saving your engineers time while allowing both engineers and analysts to focus on high-value pursuits. With a suite of tools to support data integration, the modern data stack will free your teams of monotony while empowering them with insights, automation, and advanced technology. For all of those reasons, now is the time to migrate to MDS. Here's how to do it.
What Is the Modern Data Stack?
To put things simply, the modern data stack (MDS) is a set of tools that power data integration. In order, these tools include a fully managed data pipeline for “extract, load, and transform” (ELT) processes along with a cloud-based data lake or columnar warehouse for the data's destination. On top of that, MDS requires a data transformation tool and a visualization or business intelligence (BI) platform so your company can make use of all the numbers.
The primary difference between a modern data stack and a legacy data stack lies in the host. The modern data stack is hosted in the cloud and, as a result, requires very little technical interference from its users. The modern data stack almost guarantees end-user accessibility, while the company at-large enjoys endless scalability that grows quickly without the expensive downtime associated with scaling the server room that supports a legacy data stack.
With a modern data stack, the system is built with business users in mind, removing the technical barriers that have long barred analysts and other key stakeholders from accessing and fully utilizing the data a company holds. The modern data stack also ensures that integration and analysis tools are simple to use, with little technical knowledge necessary. By decreasing technical complications alone, the modern data stack offers countless benefits to companies who adopt it.
Benefits of the Modern Data Stack
The move to the modern data stack can only come after internal recognition that data is a real asset, followed by the realization that your company isn't yet using data to its fullest potential. This is a reality that many brands, big and small, are now facing, and it makes migrating to the modern data stack an obvious, logical, and exciting next step. The trick is to understand the benefits you can expect to reap, which—in turn—will help your company come up with a reasonable and timely approach to ensure the successful adoption of MDS. Once you do, you can begin making use of the many perks.
Scalable Framework
Countless data tools have cropped up over the past decade, primarily in response to the new push from organizations to get as much value out of their data as they can. As a result, each data tool has carved out a place in the market, with most offering some highly specialized solution for one part of the data life cycle. Together, these tools form a highly effective data stack that's scalable with little technical barrier to entry. When creating an MDS for your company, you may seek out tools in some or all of the following categories:
Behavioral data ingestion for streaming behavioral event data that originates from connected devices, like SmartTVs and wearables.
Transactional data ingestion for batch or streaming transactional data originating from SaaS tools, reporting, and internal databases.
Storage in the form of cloud data lakes and warehouses, making for low-cost, persistent, and scalable storage that enables low latency access to data.
Processing for batch and streaming data transformations, helping to aggregate, filter, and alter raw datasets to get them ready for analysis.
Operations of reverse ETL to enable rich user data to be injected into countless tools, like CRMs, for further use and analysis. Ideal for self-serve data applications, such as for the marketing department.
Analysis processes, often composed of business intelligence (BI) and product analytics tools that further promote a self-serve culture for marketing and development teams.
Intelligence tools, such as artificial intelligence (AI) and machine learning (ML) to empower data science professionals to identify historic trends and predict future behaviors.
Management tools to build data pipelines, improve observability, and solve organizational problems.
Data Engineering Overhead
With a modern data stack, your organization can reduce its data engineering costs by a staggering 90% or more. This cost reduction comes primarily from eliminating the need to create data pipelines and maintain them. With fully managed data connectors that launch in minutes and automatically integrate with your company's destination (i.e., your cloud-based data lake or warehouse), the modern data stack will save substantial time and money.
Ignition Group, a media and telecommunications company, is a case study that shows first-hand the efficiency of implementing an MDS. They said, "We had initially planned to bring our data sources into the existing SQL Server warehouse. This would have taken the efforts of three people across two years, and would have cost an estimated 6 million rand [~$400,000] just to get us to where we got with [MDS] in two months."
Ability to Execute Quickly
With more time and more data on hand, a modern data stack means your company's teams can shift their full focus to upcoming analytics projects. For example, like many companies, fitness app Strava once relied on an attribution partner to handle its customer data before it adopted a modern data stack. Prior to MDS, Strava used local machines to model data using Python and R. After implementing its modern data stack, Strava can digest its own data from across marketing channels and run analysis using Snowflake.
The move to MDS enabled Strava to build its own attribution model and gain better insights into the customer journey. The company stated, "We can see if our paid users are interacting with our social or SEO channels and determine if there are any cross-effects. Using our metrics, we can determine if SEO is better or worse than our paid acquisition or our partner marketing channels. These things weren't possible when we didn’t have the data in-house."
Performance Metrics
The modern data stack doesn't just unlock additional sources of data. It also gives your company an easy-to-use BI tool that will reveal a long list of new metrics you've yet to tap into and utilize. You'll begin seeing these benefits for at least two reasons. The first being that your data with an MDS is richer, enabling new cross-analysis. The second is that an MDS improves your access to data across teams, giving more employees the chance to find and propose metrics based on their unique competencies.
Zoopla, a real estate website, is an excellent example of how a modern data stack can improve strategic data use through cross-analysis. Using MDS, Zoopla was able to replicate its new NetSuite and Salesforce data into its cloud warehouse and then use a BI tool to continuously update a dashboard with over 40 key performance indicators (KPIs) that informed the leadership team. The company said, "It was always in our mind that we didn't want to build a point solution. We wanted to ensure that all of the data we were landing could be leveraged for other purposes and we wanted to make this data available in a self-service capacity."
Examples of Modern Data Stack Tools
As mentioned, there are countless data tools out there—and it would be impossible to cover them all, much less use them all, as part of your company's modern data stack. Ultimately, the tools you choose will come down to your company's unique use case, along with its size, budget, and resources. Still, it's worth reviewing the most popular tools out there because they are well-known for a reason.
Given that there are so many types of tools your company may implement into its modern data stack, let's break down some of the most popular by category.
Data Ingestion
As StitchData explains, "Data ingestion is the transportation of data from assorted sources to a storage medium where it can be accessed, used, and analyzed by an organization."
Fivetran: With Fivetran, your company can seamlessly stream data into your warehouse, managing the data delivery process from any source all the way to your chosen destination, ensuring the most accurate and up-to-date information in the process.
Rivery.io: With universal support for any type or source of data your company needs to process, Rivery is a reliable data ingestion tool that will process data as often as you need, giving you better control over all of your data sources.
Airbyte: An open-source solution for data ingestion, Airbyte promises to get your pipelines up and running in mere minutes. Choose from pre-build connectors or create custom ones to fit your needs.
Data Storage
In the context of a modern data stack, data storage refers to a cloud-based solution, like a data warehouse or data lake, where your data ingestion tool will send your data.
Snowflake: One of the most popular data storage platforms for an MDS, Snowflake boasts that it is the only data platform that's cloud-native, offering the best experience for your data and your users.
BigQuery: From Google, BigQuery is an extremely cost-effective, completely serverless, multi-cloud solution that will help your company manage its big data with ease.
Reverse ETLs
As High Touch puts it, "Reverse ETL is the process of copying data from a data warehouse to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales and support."
High Touch: One of the most trusted names in the business, High Touch ensures effortless and accurate data syncing among the tools that matter most to your teams.
Census: Touting itself as "the easy way to sync customer data," Census keeps your teams on the same page by ensuring every tool and team member is powered by up-to-the-minute insights.
Data Visualization
The term may be self-explanatory, but the tools continue to get increasingly complex. When using a powerful data visualization tool, your company will explore fresh insights, connections, and never-before-seen metrics—all it takes is seeing your data in a new, connected light.
Mode: Dubbed "the collaborative data science platform," Mode introduces modern BI to interactive data science, making for extremely powerful and beautiful insights that you can effortlessly share with any stakeholder.
Looker: Offered by Google, Looker's goal is to "let your data do the talking," with integrated insights, workflows, and modern BI tools that help you dig deeper into the data you've been overlooking.
How to Build a Modern Data Stack
If exploring the many benefits of the modern data stack has you convinced your company is missing out on incredible insights (because of its lack of an MDS), the next logical step is to answer the question: How do you make the move? The perks of migrating to an MDS are clear, but the path to getting there can be much more complex.
The sheer number of tools that comprise a modern data stack can make for a daunting list, but there is nothing to fear. If you follow the best migration practices, making the switch can be surprisingly easy; you just need to start with a well-thought-out plan to ensure your company can pull it off.
Start Simple
Your company can end up incorporating a plethora of tools into its MDS, but that doesn't mean you need to work from the ground up trying to integrate a dozen tools at the same time. In its simplest form, a modern data stack only requires an ingestion tool, a warehousing tool, a transformation tool, and a business intelligence tool.
To further simplify the matter, countless platforms can fulfill more than one of these roles. For instance, Redash offers both data warehouses and business intelligence tools, so that option alone means you must only find an ingestion and transformation tool. Once you do: you have a modern data stack on your hands.
Plan Thoroughly
While you can and should begin with a simplified version of a modern data stack to get the ball rolling and to simplify both migration and adoption, you also don't want to jump in head-first. Yes, you could save some steps by piecing together the simplest MDS tools out there, but doing so doesn't make the plan a viable option for your company.
Instead of trying to over-simplify, your company's primary goal during the adoption process should be to thoroughly understand and plan for all the roadblocks, hurdles, needs, exceptions, and special use cases that are sure to pop up along the way. If you think moving to the modern data stack is daunting, imagine being six months into implementation and realizing you chose the wrong tools. This is a big decision that you must think through.
Choose the Right Partners
At the end of the day, there is no one-size-fits-all approach to adopting a modern data stack. Each company's plan, tool selections, and timelines will differ based on their size, resources, flexibility, and overall agility.
If your team needs help designing your data stack, then reach out to our team of experienced end-to-end data experts.
Conclusion
By doing your research, as you are right now, your company is well on its way to enjoying all the benefits that come along with a modern data stack. The next step is to get the key stakeholders together and kick off the discussion. Then, with buy-in from around the table, your company can begin planning and eventually implementing an MDS that unlocks the real power of the data you've been waiting to explore for so long.
Ask A Data Consultant - Office Hours
Every newsletter I open up a day or two with a few slots for open office hours where my readers can sign up and you can ask me questions. I got to answer a lot of great questions so far and hopefully, they helped provide a lot of insights for those who signed up.
Sign Up Below:
Next Open Office Hours
Sign Up For My Next Office Hours on August the 18th at 8 AM - 10 AM PT or between 5 PM - 7 PM PT
Thanks To The SDG Community
I started writing this weekly update more seriously about 6-7 weeks ago. Since then I have gained thousands of new subscribers as well as 12 paid supporters!
And all I can say is, Thank You!
You guys are keeping me motivated.
Every read, comment, like and financial subscription is amazing and I really appreciate all you.
If you want to help support this community consider clicking the link below.
Video Of The Week: The Downfall Of The Data Engineer
While the title of this post is sensationalistic and the content quite pessimistic, keep in mind that I strongly believe in data engineering - Maxime Beauchemin
In an article in 2017 Maxime Beauchemin wrote a great reflection on the challenges data engineers face.
He also paired it with another article called the rise of the data engineer and this was back in 2017. Before the current uptrend in data engineering content.
Here is my responses to his video.
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
How Cisco Optimized Performance on Snowflake to Reduce Costs 15%: Part 1
Snowflake is a powerful platform for data warehousing workloads. With increasing demands for data within the enterprise, Snowflake handles existing workloads from previous data warehousing solutions while enabling new data projects and new data demands. It improves data governance, provides granular data security, and enables Cisco (and our customers and partners) to harness the power of data to deliver big business impact.
As Anupama Rao shared in our blog post about our migration, Snowflake significantly surpassed performance expectations for reporting and transformations. She explained, “Transformation jobs that would take 10 or more hours to run are now completing within an hour, a 10x performance improvement. This provides our business teams more current data on their dashboards, allowing for more accurate insights based on the latest data. Reports are now on average 4 times faster with a 4x concurrency improvement, which gives our analysts the flexibility to run reports in parallel based on business needs.”
Feature Store as a Foundation for Machine Learning
Artificial intelligence and machine learning have reached an inflection point. In 2020, organizations in diverse industries of various sizes began evolving their ML projects from experimentation to production on an industrial scale. While doing so, they realized they were wasting a lot of time and effort on feature definition and extraction.
Feature store is a fundamental component of the ML stack and of any robust data infrastructure because it enables efficient feature engineering and management. It also allows for simple re-use of features, feature standardization across the company, and feature consistency between offline and online models. A centralized, scalable feature store allows organizations to innovate faster and drive ML processes at scale.
Scaling a Data Analytics Team for a Billion Dollar Start-Up With Veronica Zhai Of Fivetran
Scaling data talent is hard.
It’s hard to hire.
Hard to grow and just overall hard in terms of managing processes and output.
But there are brilliant leaders and managers doing it everyday.
I recently interviewed Veronica Zhai, the principal product manager at Fivetran. She has led Fivetran’s analytical team for nearly the last year and has managed to grow it, established a defined roadmap and deliver on goals.
In this community update, I wanted to condense the interview down to it’s key points such as hiring data talent, dealing with data flow pain points and discussing the modern data stack.
Let’s dive in!
End Of Day 16
This was a longer update. But I really wanted to dig into the modern data analytics stack.
There is so much going on in the space and I wanted to give you a deep introduction.
If you have more questions about the data stack, then comment below.
Thanks for reading.