Onboarding For Data Teams
How to set-up a streamlined onboarding experience that empower your data engineers and analysts day one.
Onboarding at most companies would be a great bit for a comedian. For many, we have just accepted the fact that it’s not a great experience where you will likely spend the first two weeks just trying to get access to email.
But it doesn’t have to be this way nor should it.
Onboarding should be considered a critical part of any team and company’s processes. This is especially true with technical roles like analytics and data engineers.
These roles require access to multiple systems, an understanding of business context, and access to key data sets. Delaying or inhibiting how smooth the onboarding process is for these roles can be costly and lead to poor retention.
Thus creating a smooth onboarding process is a worthwhile investment. In this article, I will review some of my past experiences onboarding as well as discuss some key steps and considerations a data team should have in their onboarding process.
Reviewing Some Of My Past Onboarding Experiences
When you’re a new employee joining a new team or company, it can be a stressful experience.
Everyone else on the team clicks, everyone knows how to get access to the right data, and everyone else understands the general politics of the larger organization.
The faster you can understand both the technical and people aspect of your new role, the sooner you start making good decisions with confidence.
Experience #1: The problem is most companies don’t always provide a clear path for how to go from clueless to confident. Instead, I have experienced situations where the onboarding is a day or two about the company, its founding in 1867, and its principles.
Then I spent the next three weeks waiting for access to a database that wasn’t even the actual database I was supposed to get access to. After having multiple discussions with various employees and senior team members as depicted above.
Experience #2: Compare that to Facebook where on day one they were able to onboard hundreds of people, connect you to all the systems you needed and there was an entire four-week period focused on getting you up to speed for all their various technologies(know as Data Bootcamp).
Now, this was during the before times and it was all in person. We spent time in college-like rooms going through various technologies and tutorials to get up to speed.
We learned about Facebook-specific technologies like DevServers(EC2), Dataswarm(Airflow), iData(data catalog) and Daiquery(Snowsight).
By the time I joined my team I was already comfortable working in Facebook’s infrastructure, I knew how to commit code, find data sets, and traverse multiple code directories all on my own.
What’s more, this experience was always re-examined to review if it could be better. This meant I was able to make good decisions the day I started working with my team rather than having to constantly chase down other team members to understand what was going on.
Goals Of Onboarding
To set up an exceptional data team onboarding experience, you need to understand what the overall goals are. Not just the technical goals but what state you want your new employee to be in at the end of their onboarding. Here are several goals I believe are important.
Understanding Business Goals and XFN Needs - Simply focusing on onboarding new employees from a technical perspective doesn’t provide the correct alignment on how to use those tools. Onboarding should be an opportunity for new employees to be introduced to a teams goals, how they align with the business and the role the team plays in the larger ecosystem of other departments.
Productivity - A key goal of any onboarding process is to empower the new employee to be productive. If they need to constantly ask questions or constantly get roadblocked by data sets that they can’t access, then they aren’t being productive.
Empowering Team Members To Make Decisions - Finally, the end goal is to have employees that go beyond just doing tasks they are told to do, but feel comfortable making calls and big bets on future data projects. This requires an individual to not only understand the technology they are working on but also they need to understand the business goals and needs to see possible gaps and opportunities.
Onboarding For A Data Team
There is onboarding for a company(which is really just orientation) and then there is onboarding for a team. Generally, I have found that most companies only spend a day or two onboarding all employees. Facebook does a two-day intro(at least for me) where you hear from various people about the mission, principles, and guidelines as well as other more administrative tasks like getting your picture taken for your badge occur.
After that, it’s time to onboard onto your team.
Quick Pause - Right size for your organization. Some of what I will be referencing below needs to be right-sized per company. A start-up doesn’t have as much time and resources to spend on onboarding nor does it require it. As things change quickly and communication can happen quickly via in-person discussions. I guess that’s not always the case these days.
That doesn’t mean throw caution to the wind and document nothing. But continually grow your documentation. Set-up an initial document and handbook for engineering that others can add to as they join the team.
Once you get to engineer 20, you will have a pretty well documented process.
Onboarding For Context
Gaining an understanding of what is important to a business or a team is crucial. Especially in data teams where many individuals will need to balance their business and technical knowledge to deliver impact.
This involves understanding not only the key metrics, projects, and reports that a data team is creating but also understanding who are the key stakeholders and the drivers on external teams(at Facebook they were referenced as XFNs). A great point brought up by Nate Sooter in a recent conversation I had with him on my Youtube Live was that it's not only important to understand who are your skip-levels(your boss's boss) and direct XFNs. But also who are the individuals who disproportionally make things happen?
That is to say, if things get blocked or stuck, who seems to always be the one who can make them unstuck with a conversation or two?
But that takes time to figure out. From an onboarding perspective new hires should be introduced to:
XFNs(cross-functional partners) - Besides being introduced to team members and skip levels you will want to know:
Who your XFNs are
What are their team's goals
What role do they play
Current Projects And Initiatives - It can be really easy on data teams to get stuck in adhoc hell. Data requests, maintenance, and data quality issues, all can bog down data teams from actual impactful work. What can make this problem worse is if the new data team members don’t have a good idea of what projects are going on and what initiatives are driving the team. This will leave far too much ambiguity and make it even easier to continually get sucked into adhoc hell.
KPIs - A great way to explain what is important to a business is by looking at its KPIs and other metrics. When done correctly they should provide the context in terms of what is worth tracking and asking questions about for the business.
Key Reports And Data Pipelines - Understanding what the key reports are and what pipelines support them provides two benefits. It provides further context in terms of who the team is supporting but it also provides an opportunity for experienced team members to explain to new team members if there are any issues with said reports and pipelines. Do they go down often? Or have any weird quirks? Hopefully, the answer is no since they are key reports but you never know.
Environment Set-Up
When it comes to onboarding for a team. The bare minimum is a document that contains a clear guide on how to set-up your environment. This can be in the form of a basic checklist or a set of wiki pages.
Even at the 12-person start-up I worked for had a guide that went from 0 to 1 for your environment set up.
Here are some key considerations you should have in your environment set-up document.
VPN - Most data and Cloud services need to use a VPN to manage access to their intranet. Not having access to this on day one means a new employee likely will be locked out of key systems.
Database access - Obviously don’t store your password and host for your various databases in a document. However, do reference how an individual should set-up any form of data warehouse account or where they can get secrets and information for source databases(usually in some form of password manager).
Cloud API keys and token set-up - Somewhat connected to database access is API key set-up. Whether it’s for S3 or Cloud Storage, most data teams rely on cloud providers to both store and provide data.
Code repo access - Whether you’re using Github, Gitlab or some other repo making sure an engineer has access day one ensures they can become familiar with your code base quickly.
Binary and custom library downloads - Some companies have specific libraries and packages that are required in order to run either their custom software or legacy applications. Make sure these are noted to avoid any blockage.
Jira Account - Or whatever tool you’re using to manage tickets and project management. This isn’t really environment set-up, but I wanted to include it somewhere.
Commit Something Day One
Committing code day one is a recurring theme for many companies. Adam referenced it above, but I am also aware that Gitlab and Facebook also have their new engineers do the same thing.
This can help break the ease the anxiety towards committing code to a new repo. It also makes the environment set-up a little more real and immediately applicable.
Standards And Style Guides
There is enough ambiguity in data projects, creating clear standards and style guides that can be given to new employees during onboarding can help get them up to speed quickly. Rather than having to spend a large amount of other teammates' time reviewing their code for NITs a new data engineer can create PRs and then push them already knowing what the general standards are.
Whether this is for SQL or code having some form of style guide can provide a lot of benefits. What is great is that you don’t need to develop one from scratch. Several companies have shared their style guides, they are listed below:
Handbooks And Process Guides
In early-stage startups, it's particularly tempting to avoid a documentation strategy. With only a few team members, it's feasible to keep everyone informed via meetings, Slack, or email threads. Long-term, this oversight becomes increasingly harmful.
As a team scales, the need for documentation increases in parallel with the cost of not doing it. Said another way, implementing a documentation strategy becomes more difficult — yet more vital — as a company ages and matures.
Many teams have specific processes in terms of workflows, project management, change management and data governance. All of which is information that eventually needs to pass to new employees. Initially as start-ups are small this will mainly be word of mouth. However, there does come a tipping point where it starts to make sense to document all of these various process.
I believe Gitlab has a great explanation about their handbook approach.
Run Books - Run books are not exactly handbooks but they live in a similar category. As teams mature and start having large portion of their work be maintenance focused, then run books become very helpful. They will often answer questions like:
How do you take a system down and bring it back up again(and not take down AWS)
How do you manage data requests for data that needs to be more secure
How do you debug that one-off pipeline that seems to breakdown once a month
All of this can be captured in run books that can help get new team members up to speed. Who am I kidding, they also keep more experienced members sane so they don’t have to remember every step to restart their internal systems.
Now What
Onboarding, in my opinion, doesn’t stop once an environment is set up or XFN intro meetings end. Instead, it is a continuous process that bleeds into the first few bits of work and possibly even the first project is delivered.
This is because, as referenced above, one of the goals of onboarding is to provide confidence and independence. A great way to do that is to deliver on your first mini-project and get feedback on it.
Feedback is always a gift, but from my experience, detailed feedback early on is very helpful. It helps set the tone for what the new team cares about and expectations.
Onboarding Is Your First Impression
If first impressions matter, then onboarding matters. All the small details matter and can come at a cost when not done right.
If you’ve experienced a thought-out onboarding you notice it. Almost by not noticing it. You don’t have too many questions about where you can find key information, nor will you need to spend hours chasing down IT tickets to get access to a database or download a one-off application.
I would love to hear some stories from you, the reader. Any examples of what you think make a great onboarding process?
🥳 Seattle Data Guy Events 🥳
Let's talk about data.
We now have a date set for the San Fransisco Data Happy Hour (9/28).
We can talk about data engineering, machine learning, or maybe *shudders* a topic not involving data! Currently, 150+ data practitioners, founders and VCs have signed up.
If you want to join in, then you can sign-up here.
Also, special thanks to Coalesce.io and Armon Petrossian for partnering with me on this event.
Video Of The Week: Vocabulary for Data Engineers - Data Engineering 101
Join My Data Engineering And Data Science Discord
Recently my Youtube channel went from 1.8k to 40k and my email newsletter has grown from 2k to well over 18k.
Hopefully we can see even more growth this year. But, until then, I have finally put together a discord server. Currently, this is mostly a soft opening.
I want to see what people end up using this server for. Based on how it is used will in turn play a role in what channels, categories and support are created in the future.
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
Indistinguishable from Magic
Magical Technology: iPod, Spotify, Figma, AI Art, Arc, web3, and beyond
There are a lot of reasons that Figma succeeded in the face of a much bigger competitor to the point that that competitor was forced to pay 50x ARR to stop the advance. Practically everyone in tech has shared their opinion in the past few days. Kevin Kwok broke many of those reasons down in his classic Why Figma Wins and I agree with all of it. But what the piece doesn’t address – even though the mechanics Kevin covers are inputs to it – is the magic powers that Figma gives its users. Just look at all of the results for “Figma magic” on Twitter. As if on cue, Adobe’s Chief Product Officer, Scott Belsky, tweeted: “Figma will continue to operate w/ autonomy, continuing to work their magic.”
Don't be foie gras
If 2016-2020 was the golden unicorn age of SaaS, 2021 nearly hurtled us into the foie gras* era. I say “nearly” because the current market correction may head off the trend of overstuffing private software companies with capital in the hope they grow faster.
*h/t to harry stebbings for the metaphor
An overabundance of capital is bad for most businesses. Inside a company there should always be healthy competition for investment, bordering on intense. Vigorous debate between product, marketing, sales, and finance leaders about what gets funded and what gets cut is critical to maintaining high ROIC.
End Of Day 53
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, tech, and start-ups.
If you want to learn more, then sign up today. Feel free to sign up for no cost to keep getting these newsletters.
Very timely article for me. I am ramping up a confluence page at work to onboard a new guy.
Yet to have a good onboarding experience.
But I hired a few people to my analytics team in the last couple of years and experimented with onboarding strategies. The key challenge was to get them acquainted with the data and the documentation. Because if you are working on an analysis and can't find the data - it really sucks!
So i designed a 3 week onboarding covering different modules of the product. Everyday someone from the team would take a 2 hr call and explain the module, the data, and the key aspects of the same, and then the new joinee could do some data exploration or work on a small problem.
It worked okay. It was too much though. So wondering what to do next.