3 SQL Interview Tips And Questions For Data Scientists And Data Engineers
SQL has become a common skill requirement across industries and job profiles over the last decade.
Companies like Amazon and Google will often demand that their data analysts, data scientists, and product managers are at least be familiar with SQL. This is because SQL remains the language of data.
This has lead to SQL being a staple part of many data professionals interview loops. Meaning you better plan to include SQL in your study sessions. That being said, just walking through SQL problems doesn't always cut.
In this article I will go over three tips to help you improve your SQL both for interviews and in general.
Attempt To Solve A SQL Problem In Multiple Ways
There is more than one way to solve most SQL problems.
But often we resort to the methods we have most recently used.
For example, if you just learned about analytical functions it can pigeonhole a lot of your future solutions. This is why it is important to try to solve SQL problems in multiple ways.
In order to do all of our problems I will be using problem sets from InterviewQuery which is a site that can be used by data scientists to practice much more than SQL.
But for now let's look at a simple problem they offer.
We're given two tables, a
userstable with demographic information and the neighborhood they live in and a
Write a query that returns all neighborhoods that have 0 users.
The tables we have are listed below.
Here we have two input tables and an expected output. Before reading further do you have any thoughts on how to get neighborhood names that have 0 users?
What is your first answer?
Regardless, et's look at this first example of a possible solution.
SELECT N.name FROM neighborhoods N LEFT JOIN users U ON N.id = U.neighborhood_id WHERE U.id is null
This solution relies on the LEFT JOIN to return any neighborhoods without users having a u.id that is NULL.
But this example can be a little confusing and not as readable for some as it's not explicit. The LEFT JOIN kind of makes it a little difficult to read. The reason for this is that you will have to mentally manage the logic in your head.
Let's take a quick look at the table the LEFT JOIN would create "WITHOUT THE FILTER".
Above you will see that user_id has null values. This is because we are LEFT joining with the left table being neighborhood. Meaning that where there are no users, there will still be neighborhoods.
But, this is just one way to solve this problem. Let's write this query differently.
Have any ideas?
You can check out the query below which is the same thing really but in a more explicit way.
SELECT n.name FROM neighborhoods n LEFT JOIN users u ON n.id = u.neighborhood_id GROUP BY n.name HAVING COUNT(u.id) = 0
This query uses a more explicit HAVING clause to only count u.ids. COUNT only counts when u.id is not null. So in that way, the query truly operates the same way.
So if there isn't a matching ID for neighborhood id in the user table, then it will replace the user-id with null. Thus, if I write where user id is NULL, I will get the same response as COUNT(u.id).
Keep reading with a 7-day free trial
Subscribe to SeattleDataGuy’s Newsletter to keep reading this post and get 7 days of free access to the full post archives.