Recently, I heard 2-to 3 data leaders say that the default state of most data teams is failure. But I don’t think most data teams are set up for success to start with(I also don’t believe that failure is the default state).
I do however, find that many data engineers, analysts, and scientists get thrown into the role of Head of Data or perhaps director and don’t get much in the way of coaching or advice.
There is just the expectation that we will do a good job because we were a good individual contributor.
So I wanted to start to put together a list of what you need to know and do to be successful as a leader in the data space.
This ranges from taking even more time to understand the business to being intentional on how you hire and place talent.
So if you’re thinking of leading a data team or perhaps already are this is for you!
Understand the Business
If you're doing these in any sort of priority order, this should be number one. - Tom Rampley Head Of Data At LastPass
There’s always been a lot of talk about the business needing to become data fluent and not enough talk about how data professionals need to become business fluent.
How?
You must talk to and understand the business while doing your own research on your industry operates. Someone has to act as the liaison between the business and data, and it's likely going to be whoever is in charge of the data team. Otherwise, you'll have a hard time building what the business needs.
Yes, the business should have a light understanding of how to read charts and understand some basic data concepts.
But there is a line and it’s probably easier for a data professional to learn more about how a sales funnel works or how a patient is processed in a hospital and how that translates into the data. You’re literally staring at the data that is a derivative of the business.
To add a little more in terms of understanding the business. I really liked this comment that Veronika Durgin had about making your data team irreplaceable.
Data teams become critical and irreplaceable when they are working on critical company initiatives - Veronika Durgin VP of Data at Saks
If you want to be part of company initiatives and not just be an ad-hoc help desk, you’re going to have to get into the weeds of what goes on in the business.
In the end, the business won’t be getting into the weeds of how your data infrastructure works.
The Business Doesn’t Care About How You Solve the Problem
“Never talk about data technology, infrastructure, or queries with people outside the data team -- they just don't care.” - Ethan Aaron CEO of Portable
Never is perhaps too strong. There is, as often in many topics, nuance.
Jeff Nemecek the Director of Engineering & Architecture at The Walt Disney Company made a great comment here covering some of that nuance, stating that:
Instead of using a vendor reference (Snowflake, Kafka, AWS, S3, Airflow...) speak of functions (data warehouse, streaming events, process orchestration, data pipeline) in ways a business person can connect with. The purpose is to help them understand the complexity of your work with clarity, not confusion. Why is it going to take a month to get the new product line integrated into the daily reports? They often need to understand the functional steps, but not the details of the underlying technologies.
I’ll add a few follow-ups. First, make sure everyone is on the same page in terms of what the words you use mean. I have seen teams struggle to move forward with projects because no one clearly defined what they meant when they were referencing their Postgres instance.
Was it a data warehouse, ODS, a database. I heard every term get thrown around. Not everything fits into a perfect box, so sometimes you just need to define it, its general function and move on.
Second, just to make a point clear, if you ever have to open up an IDE or start explaining why a query doesn’t work with the C-suite on the same phone call, you’ve likely f*cked up.
When you speak to the business, it’s not about you and your technical problems. The business wants to know what you’re doing to move the project forward, to drive the outcomes they are looking for you to help them look good in front of shareholders or their boss, and everything you think is important isn’t (unless you’re blowing up your cloud bill). Then they are suddenly going to ask you about specific technologies.
Overall, you need to communicate with the business about what the business cares about, this means the functional components and business outcomes. You still want the executives to be able to ask questions confidently without needing to understand all the technical details.
Bad Data Quality Will Cost You
Your data must be accurate. You only have so many at-bats, as
says, so focus on building reliable systems.Being $1 off today means you could be $100,000 off tomorrow.
Those small details matter. You’d be surprised what a CFO will notice when they are looking at a report only to see that their units sold or total expense for an account is off by $5. Those details matter, and the more you can ensure the data is accurate in the source as well as in whatever you decide to call your data analytics storage layer, the less likely you deal with these callouts.
Because once you lose trust, you’re going to have a very hard time gaining it back.
What’s worse, it further encourages shadow data teams and or decentralized processes that other departments might take on to build reports and numbers they want to see.
"A lack of data quality will cause other departments to silo their data when zero executives trust the enterprise data, and each come up with their own numbers.”- Eric Gonzalez VP, Business Intelligence Architecture at Eastern Bank
So keep that in mind when you think, hey we are only $5 off!
Take What Vendors Say With A Grain Of Salt
There are plenty of articles out there talking about the best way to set-up your data infrastructure.
But it’s important to know the source of said article. For example, I am sure there were plenty of articles talking about how schema-on-read was going to be THE method to help reduce the time to insights because you no longer had to spend as much time developing a data warehouse. Instead you could use a data lake. Forget ever needing a data warehouse!
Now I still see plenty of data lakes, but often they act more as a place to do initial processing with cheaper compute that will then get loaded into the data warehouse. But man, I remember back in 2015-2016, it seemed like everyone was talking about schema-on-read like it was a best practice.
But really it was a combination of vendors pushing Hadoop and a new paradigm that we were only beginning to figure out how to utilize(but there was a Google paper on it, so we better make it work). To be clear, Hadoop continues to play a massive role in the data world today. But there has been a shift away from heavily relying on schema-on-read.
Technology and best practices take time to actually settle and find their place. So if you believe a vendor is trying to push a methodology or approach that is new and untested just to sell their product, they probably are.
Be Intentional With Data And Your Data Roles
At the end of the day, the goals of data engineers and data architects are just different from that of data scientists and analysts. In turn, we approach the development of data pipelines and data sets with different intent.
That’s why data engineers and data architects should be in charge of the core data layer of a company, the data that represents the core aspects of the business. Some people may call these the core entities and relationships. But they are the building blocks, from a data perspective, that all other analytics and machine learning will be built on(obviously many companies just have a data analyst so this is likely less relevant to them).
Whether this is one team or a data engineer in each team, this layer of data should be treated like infrastructure with everything else being built on top of it.
Now I do want to be clear–the way I would envision an ideal state would be after the core layer of data is developed, and the following layers would be less restrictive. Perhaps an analytics engineer builds their own models, or as Bill Shube Sr. Manager, AMS Supply Chain Operations Technology and Analytics at the Lego Group referenced, the analysts and business users might build out the "analytics final mile." I have no qualms with that and honestly see it as an ideal outcome.
It allows data engineers to really focus on building high-quality and reliable datasets and the analysts to build more ad-hoc reports, dashboards and one-off use cases.
Navigating the Role of New Data Leaders
Truth be told, this post was inspired by a multitude of things. As stated earlier, the data leaders calling most data teams failures really struck a chord. I don’t believe that most data teams are failing. However, I do believe that we’ve started to shift away from the discipline required to build reliable and trustworthy data systems.
I also believe that most new data leaders are often thrown into the role with little care for how they will adjust. So if you really enjoyed this topic, please let me know and I can start to dig even deeper!
With that, as always, thanks for reading!
And if you are a data leader who is looking for advice on how to better lead their team, feel free to set-up a consultation here
Video Of The Week - Building Data Pipelines At Facebook - How To Manage An Exabyte Data Warehouse
Join My Data Engineering And Data Science Discord
If you’re looking to talk more about data engineering, data science, breaking into your first job, and finding other like minded data specialists. Then you should join the Seattle Data Guy discord!
We are now well over 7000 members!
Join My Technical Consultants Community
If you’re a data consultant or considering becoming one then you should join the Technical Freelancer Community! There are tons of free resources you can access to expedite your journey as a technical consultant.
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
What Goes Around Comes Around... And Around...
Two decades ago, one of us co-authored a paper commenting on the previous 40 years of data modelling research and development [188]. That paper demonstrated that the relational model (RM) and SQL are the prevailing choice for database management systems (DBMSs), despite efforts to replace either them. Instead, SQL absorbed the best ideas from these alternative approaches. We revisit this issue and argue that this same evolution has continued since 2005. Once again there have been repeated efforts to replace either SQL or the RM. But the RM continues to be the dominant data model and SQL has been extended to capture the good ideas from others. As such, we expect more of the same in the future, namely the continued evolution of SQL and relational DBMSs (RDBMSs). We also discuss DBMS implementations and argue that the major advancements have been in the RM systems, primarily driven by changing hardware characteristics.
End Of Day 134
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, tech, and start-ups.
“The Business Doesn’t Care About How You Solve the Problem”. When I read this topic detail, you mention that they don’t care about the technical part, which I agree. But you have to sell your leaders the high level part on how you do things still so it will not bite you in the end or if you want to leave a good long lasting impression.
To put it an analogy, when talking to leaders, assuming electrical cars don’t exist, you will want them to buy a car that can run more miles for the same amount of fuel. Although the investment is more expensive, the ROI is better on the long run and you need to show this path is more profitable over the long term.
Of course, most use cases you don’t need to tread much carefully on those parts as doing things inefficiently is insignificant to the impact of data initiatives at least on the start and where execution of speed makes a huge difference.
You have to know the context of the business to determine if the way of doing things efficiently matter. The biggest reason a data team is formed within a company besides helping achieving the company bottom line is to make things efficient and efficiency is much more than just operational costs, from having more visibility of the data for more opportunities the business can yield to making developers and consumers more productive and data literate.
Great read! I believe we have seen a lot of promotions of data people into leadership roles that were 'leading' rather than 'lagging' i.e. anticipating someone's readiness rather than making them do the job ahead of the promotion.