It can be challenging to break into data engineering directly. There seems to be a need to learn a whole plethora of skills that range from programming to data modeling. Perhaps that’s why I often see a lot of individuals moving laterally into data engineering; many people go from data analyst to data engineer.
Both roles likely have exposure to the data warehouse and work with some similar tools including SQL and perhaps some Python or scripting, making the transition easier. Regardless of your background, I wanted to help those who are looking to switch from a data analyst role or non-technical role to data engineering.
To do so I created a 100-day plan that you can use to get up to speed quickly on what data engineering generally covers. But before diving into that, let’s talk about some of the limitations.
Speed Isn’t The Goal
Speed isn’t really the goal of this exercise. The 100 days is meant to act as a tool to help you commit to the idea of learning data engineering. If anything, you’ll learn more about what you don’t know and areas you’d like to focus on vs. fully learning data engineering.
After all, 100 days isn’t really enough to get deep into all of these subjects.
Many people take years to truly learn and perfect a subject, so don’t feel like your goal is to rush and become an expert.
For now let’s just focus on the next 100 days!
1-10 Days - Reviewing The Basics
In your first 10 days, the focus will be on building the habit of learning data engineering as well as understanding where your current skill levels are. So if end these ten days and feel like you need to spend more time in this section, that’s fine. Again there is no rush!
For this section we will be focusing on a few key areas:
As part of this process I believe it’d be a great idea to understand where you’re currently at. So start day one by reviewing some of the problems you’ve taken on using these different tools and techniques:
What is the most complex SQL problem you have solved? Could you have solved it an easier way?
What do you understand about data modeling and data storage paradigms? Do you understand concepts such as normalization, denormalization, slowly changing dimensions, etc?
Do you understand the basics of programming, loops, conditionals, classes, etc? What are some of the most challenging problems you had to solve at work and what made them challenging?
Etc.
Once you’ve written up a review of where you think you're at, we can start to focus on the next nine days. We will review the basics of SQL, programming(we are going to use Python for this 100 days), data modeling, etc. As you go through each topic for the nine days, here is what I'd recommend:
Watch/Read/Practice the content that is referenced
Take a notebook or Google doc and note down what the content was about. Try to think of 2-3 ways you can see it being applied in real life.
Share about your learning (with a friend, social media, etc.)
Find a group of people to help support your journey(I have created a discord channel you can join!)
The purpose here is to shift from passively consuming content and actually trying to engage with it. This will help you avoid tutorial hell.
Once you’ve gone through the basics, you can now start to dive deeper.
11-40 - Diving Deeper Into the Basics
Now that we’ve got a good baseline for where your skills are and you’ve reviewed some of the basics, you can dive deeper. In particular, there are several key areas you’ll likely need to focus on:
Programming(Python, you’re welcome to switch the videos/articles if you’d prefer a different language)
Cloud Infrastructure
Data modeling and data pipelines
Then create your first mini project.
I am making the assumption that you’ve likely already know a lot about SQL, so the focus will be more about improving your other technical skills.
Programming
One of the challenges with gauging people's skill level in programming today is that many people might be skilled in a framework more than they are skilled in programming. For example, perhaps you know Pandas really well but struggle to write basic Python. Libraries provide a lot of benefits but they can force you to solve some problems with a chainsaw-level solution (where you just need a butter knife).
So for the programming section, we’ll focus on reviewing some data structures and algorithm concepts as well as taking on problems you’d like to face in real life.
Example of Content on the Checklist
The Cloud
You don’t have to learn one skill at a time when it comes to programming and the Cloud. You can set up some basic Lambda functions or an EC2 instance and test out your ability to navigate around Vim.
All while learning more about S3, IAM, Event Bridge, MWAA, etc.
So for this section, we’ll incorporate some challenges that involve programming and the Cloud.
Example of Content on the Checklist
Setting Up IAM
Working with S3
RDS and More
Data Modeling and Data Pipelines
For the final part of this section, we’ll go over data modeling and data pipelines. There is a lot of great content on data modeling; it’s just mostly in books. But I’ll be doing a live or two to correlate with this 100 days so keep an eye out for that!
41-50 - Build a Mini Project
Projects don’t have to take months. They don’t even have to be complete for you to learn something. Now towards the end of the 100 days, we’ll go over how to come up with a larger project idea.
But for now, let’s spend the next 10 days trying to build something and analyze its data.
Here are a few data sources you could use:
Now with that you can pull the data using Python into S3, load it into Postgres, Bigquery, Snowflake, or another similar data storage solution and go through the analytics process.
Create a basic dashboard and share it!
That’s it. You don’t have to try to use every tool under the sun to deliver a basic analysis(even the dashboard might be overkill). The purpose of these 10 days is to try to come up with a small analysis project and use a few of the basics you’ve covered in the last few weeks.
51 - 70: A Survey of Tools and Best Practices
You generally want to avoid becoming a framework or tool-based engineer. There are so many tools that exist and not all stick around forever. You might learn Hadoop today only to find out that the data world is moving on tomorrow. But there are many out there and I do think it’s worth learning about them and where they fit.
Part of this isn’t just understanding where tools fit, but more importantly, where they don’t. It’s natural for you to want to interject solutions even where they don’t fit. Sometimes, a simple Lambda or cron run script is all you need. Other times you can use a solution like Airflow or other orchestration solutions.
If you’d like to learn more about some of these solutions outside of the checklist, you can check out the content below:
Data Governance
With over two months of content consumed. It’s time for you to put everything together and take on your own project.
71-100 - Commit to a Project
The best projects are those that involve topics you enjoy, projects that hiring managers would connect to, and possibly a project that could make you money (but don’t get stuck on this).
Here is a quick guide you can use to frame your project.
Pick a data set you’d like to work on (if you can try to pick a problem you’d like to solve first and then find a data set; however, data sets don’t always exist).
Pick a tool/framework to process your data. This could be AWS Lambdas, Cron and Python scripts, Airflow, Mage, Prefect, etc. If this is your first project, then using a Lambda or cron set-up is a great place to start.
Pick a data storage solution (Snowflake, Postgres, Databricks, S3, etc)
Pick a visualization or an app layer.
Now you might have finished reading this and still be wondering, “What project should I take on? Here is an exercise for you.
List out 10 questions you’d like to answer with a data set you’d like to use
List out 2-3 skills you’d like to improve
Now pick one of the questions and build around that(but the checklist will take you through this more)
If you’re still struggling to start, then you can read this article about starting your next data project.
After 100 Days
Once you pass 100 days, you don’t have to stop learning. Instead, take a moment to reflect where you felt strong and weak.
Take a moment to figure out what you’d like to do as a data engineer, where you’d like to work, and what it’d take to get a job there. That will let you know what you should start learning next.
And, more importantly, perhaps it’s time to improve your resume and start applying (if you haven’t already) or try to see if you can get involved in some data engineering projects at work.
Thanks for reading.
The Ultimate Guide To Starting An Independent Consulting Company In 2024 | Data Consulting 101
Join My Data Engineering And Data Science Discord
If you’re looking to talk more about data engineering, data science, breaking into your first job, and finding other like minded data specialists. Then you should join the Seattle Data Guy discord!
We almost have 4000 members!
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
Airflow is not an ETL tool…
In this post, I will look into three data orchestration tools — Mage, Kestra, and Dagster for orchestrating a job that extracts data from WeatherAPI, does the light-touch transformation, and loads it into a DuckDB database.
You can find the original code that I use to adapt to a particular tool here.
Airflow is not an ETL tool or what is a data orchestration?
I remember being confused by the phrase “Airflow is not an ETL tool…” — you can replace Airflow with any data orchestrator — and the reason I was confused is because of the PythonOperator that allows you to execute Python functions on the same machine where Airflow is installed on. The fact that some examples of DAGs available on the Internet placed these Python functions in the same DAG configuration file that contains workflow definitions didn’t make it more understandable.
End Of Day 109
Thanks for checking out our community. We put out 3-4 Newsletters a week discussing data, tech, and start-ups.
Nice! I will try to take your advice and start learning to become one of you!
It's a nice journey for the aspiring engineers from Analyst To Data Engineer . I too recently covered some of the fundamental DE questions here, could be nice to refer: https://www.techbeamers.com/data-engineer-interview-questions/