At my very first job in data, I was an analyst on a finance team. I was working with a senior engineer who had built out several new automated processes and was managing a data warehouse and several other applications.
We were supporting 4-5 financial analysts who were relying on our data to report to the CIO.
Two weeks after I started, the senior engineer left and I became responsible for everything they built. After continuing to manage the entire set of processes and automated reporting, I got a call from the “official data team”. They were calling because they realized our team had been creating a duplicate of the data warehouse every night and using it for reporting that they weren’t aware of, and these reports were suddenly causing issues at executive meetings.
I, the whole time, not knowing it, had become a shadow IT team.
The Problem
If you’ve worked on a data team, you’re constantly fielding requests. There are ad-hoc requests that can come from analysts and PMs. You have larger project asks that come from the executive team, and hey, a vendor you were paying for might be sunsetting a product and now you have to migrate to their new solution.
Everyone wants your time and like every team at a company, you’re understaffed. But other departments need answers and they can’t wait around all day for you. Tools like Excel allowed a broader range of users the ability to answer many of these questions. Instead of waiting for data teams to get back to them with views or ad-hoc queries they could directly connect to the data sources they needed.
Excel - the “self-serve analytics” tool before Tableau - started to run with that term back in 2012 (at least that’s the first time I saw the term).
Those reports and tools then become part of the team's everyday process and with new SaaS tools, it’s become even easier to pay for a $10 a month Retool subscription that then allows your team to build applications with no oversight.
Another problem that many business teams run into when working with IT is that generally IT doesn’t always understand the problem’s context. At least, not as viscerally as the operations teams that are dealing with the problems on a daily basis. Meaning there is an unavoidable back and forth as both teams try to communicate and collaborate.
So why not just do it, just start a Shadow IT team.
Quick Pause: We need your help! Our team is putting together a survey to better understand the state of data infrastructure. We will be sharing our results in this newsletter. If you work in data(whether an analyst or VP of engineering) we would love to hear from you!
What Is Shadow IT
Ok, we have talked around what shadow IT is for the last few paragraphs. So what is Shadow IT? Well, it is the usage of hardware or software(these days usually SaaS) by a department or individual without the knowledge of the IT or other governance/compliance teams within an organization.
This is becoming an ever-growing issue and estimated that nearly half of all IT spend “lurks in the shadows, as recently reported by a 2019 study from Everest Group.
Ok, but work is getting done right? So why should companies care?
Risks
IT and data teams can feel like they get in the way with all their needs for security compliance and data governance. Their goal is to:
App sprawl
At one of the first meet-ups I went to, I heard a CIO give a presentation that discussed how they had been hired at a company that gave them the initiative to reduce the number of applications in use. They chuckled a bit in the talk as they said they quit 2 years later because in that 2 years they had gone from 155 applications to 200. Teams are constantly requesting new technology, and purchasing duplicative functioning solutions all across an organization. Especially ones that lack any form of IT governance.
Security
Even if the specific app isn’t notoriously insecure, every piece of IT added without planning and consideration represents a possible attack surface. Each new process that moves data unencrypted via a thumb drive or a new single sign-on added into a workflow adds a new attack surface. IT teams already spend plenty of resources on cybersecurity for known applications and attack surfaces. Adding in new applications outside of the known space just amplifies the risk.
Rework
A common pattern that occurs when working with shadow data teams is a non-IT team member builds a process in VBA, Excel, Tableau, or Jupyter Notebooks, and then it becomes difficult to manage. So they then go to IT and ask them to rebuild and automate the process. I have been involved in this process multiple times. In one case the process that had been developed involved 10 different data sets that were pulled and managed in 10 different ways. Trying to simplify and standardize the process was an endeavor.
Cost
Shadow IT and data teams are often a function of companies trying to keep costs low. You avoid hiring more engineers and IT professionals and offloading work to other employees. This, in the short run, may keep costs low. But there are a lot of costs that may be incurred in the future. Such as needing to rework systems that weren’t developed for longevity, pose security risks, or had key person risks where the individual who built the shadow IT process leaves and no one else knows how it works.
All of this leads to unforeseen costs that could be far greater than the original cost of going through IT.
Benefits of Shadow IT
Yes, Shadow IT and data teams pose risks to businesses. So why do they show up everywhere? Whether you’re an SMB or Enterprise, there is a good chance that teams are setting up infrastructure in the form of SaaS applications, custom code, cloud components, and more.
Well here are just a few benefits of shadow IT.
Faster delivery of data, automated processes, and other IT deliverables
Reducing costs in the short term.
Empowerment of business users and often less back and forth between IT and the business
Final Thoughts
To some degree shadow, IT and data teams are unavoidable. With the growth of SaaS and tools that are increasingly affordable to pay for monthly subscriptions, employees are going to continue to use said solutions.
And yes, it’s nice to think that we should build systems, data models, and processes that follow best practices. I just don’t know if we can close the Pandora’s box that is Shadow IT at this point.
As more individuals learn SQL and use tools like Retool and Airtable to build solutions without even reaching out to IT or data teams, then maybe the new problem to be solved is how to ensure they don’t put the company at risk.
Even if shadow IT keep costs low, these projects still require some planning and thinking through to ensure they don’t run up costs, send data to the wrong place or simply break when an edge case is hit.
What are your thoughts?
Build SQL Pipelines. Not Endless DAGs!
With Upsolver SQLake, you build a pipeline for data in motion simply by writing a SQL query defining your transformation.
Streaming and batch unified in a single platform
No Airflow - orchestration inferred from the data
$99 / TB of data ingested | transformations free
Video Of The Week: The Harsh Reality of Being a Data Engineer
Join My Data Engineering And Data Science Discord
Recently my Youtube channel went from 1.8k to 46k and my email newsletter has grown from 2k to well over 25k.
Hopefully we can see even more growth this year. But, until then, I have finally put together a discord server. Currently, this is mostly a soft opening.
I want to see what people end up using this server for. Based on how it is used will in turn play a role in what channels, categories and support are created in the future.
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
Orchestrating Data/ML Workflows at Scale With Netflix Maestro
At Netflix, Data and Machine Learning (ML) pipelines are widely used and have become central for the business, representing diverse use cases that go beyond recommendations, predictions and data transformations. A large number of batch workflows run daily to serve various business needs. These include ETL pipelines, ML model training workflows, batch jobs, etc. As Big data and ML became more prevalent and impactful, the scalability, reliability, and usability of the orchestrating ecosystem have increasingly become more important for our data scientists and the company.
In this blog post, we introduce and share learnings on Maestro, a workflow orchestrator that can schedule and manage workflows at a massive scale.
How I learn machine learning
“Almost all advice is contextual, yet it is rarely delivered with any context”, writes Justin in this post of things he’s learned in 20 years as a developer.
The context for the advice I’m about to share is: I started without an engineering background and through hard work and a lot of luck became a machine learning engineer.
My overarching goal as an MLE is to continuously work towards designing and deploying well-designed, and transparent machine learning systems and to learn the best software engineering practices to do so.
The best engineers I know that do this well are constantly asking questions and learning, and this is a goal I have for myself in my career, as well. It’s important for me to understand the machine learning stack end to end, and I’ve felt most valuable and fulfilled in roles where I can contribute to both modeling and infrastructure. The way I do this best is by thinking in patterns. I am also (painfully) realizing that most machine learning work, at its core, is software engineering fundamentals and gruntwork.
Onboarding For Data Teams – How to set-up a streamlined onboarding experience
Onboarding at most companies would be a great bit for a comedian. For many, we have just accepted the fact that it’s not a great experience where you will likely spend the first two weeks just trying to get access to email.
But it doesn’t have to be this way nor should it.
Onboarding should be considered a critical part of any team and company’s processes. This is especially true with technical roles like analytics and data engineers.
These roles require access to multiple systems, an understanding of business context, and access to key data sets. Delaying or inhibiting how smooth the onboarding process is for these roles can be costly and lead to poor retention.
Thus creating a smooth onboarding process is a worthwhile investment. In this article, I will review some of my past experiences onboarding as well as discuss some key steps and considerations a data team should have in their onboarding process.
End Of Day 59
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, tech, and start-ups.
If you want to learn more, then sign up today. Feel free to sign up for no cost to keep getting these newsletters.
We call Shadow IT which encompasses all IT/ data related stuff done by business users without the help of IT as Citizen Dev.
OMG, I am shadow IT. 😲 👍🏼