3 More SQL Aggregate Function Interview Questions for Data Science
Lacking inspiration on how to prepare SQL aggregate functions for a job interview? Here are three interview question suggestions to get you out of a rut.
Image by Author
The aggregate functions are one of SQL’s essential features. From SQL being one of the essential data science tools comes the conclusion there’s no living example of a data scientist who never used SQL aggregate functions.
How would they otherwise analyze data and provide meaningful metrics? A data scientist’s life without data aggregation is like a broken pencil – pointless.
The interviewers know that. And they will make sure to check your aggregation skills left and right.
The Big Five SQL Aggregate Functions
The SQL aggregate functions take values from multiple rows and return a single aggregated value. They are most commonly used with the GROUP BY clause. This allows you to see the group names and the aggregate values.
There are quite a few aggregation functions in SQL. Their number also depends on the SQL flavor you use.
But no matter what, the most common aggregate functions in SQL are these five.
Image by Author
The functions’ names are not too cryptic. Anybody can intuitively conclude what each function does.
However, knowing how to use them when facing a business problem is another pair of shoes. These shoes need a bit of breaking in.
The next three interview questions should help you with that.
1. Average Weight of Medal-Winning Judo
This is a question by ESPN you can find on StrataScratch.
“Find the average weight of medal-winning Judo players of each team with a minimum age of 20 and a maximum age of 30. Consider players at the age of 20 and 30 too. Output the team along with the average player weight.”
Here’s the link to the question if you want to follow along with me.
The question requires you to find the average weight of a medal-winning judo player by each team. This wording obviously means you have to use the AVG() aggregate functions.
Only medal winners between 20 and 30 years of age should be considered. Use the MIN() and MAX() functions to achieve that.
Put in the code, it looks like this.
SELECT team,
AVG(weight) AS average_player_weight
FROM olympics_athletes_events
WHERE sport = 'Judo'
AND medal IS NOT NULL
GROUP BY team
HAVING MIN(age) >= 20
AND MAX(age) <= 30;
The query selects the team, which will also be used in GROUP BY, and calculates the average weight from the table olympics_athletes_events. The WHERE clause is used to include only judo and exclude the competitors that didn’t win any medal.
Finally, use the MIN() and MAX() functions in the HAVING clause to show winners between 20 and 30 years.
Here’s also the code output.
team | average_player_weight |
France | 77 |
Georgia | 84 |
Japan | 70 |
Romania | 48 |
2. Count Student Number in Departments
The question from LeetCode asks you the following.
“Write an SQL query to report the respective department name and number of students majoring in each department for all departments in the Department table (even ones with no current students).Return the result table ordered by student_number in descending order. In case of a tie, order them by dept_name alphabetically.”
Here’s the link to the question to follow along with me.
The clue is in the question title. Solving this question calls for the COUNT() aggregate function.
SELECT dept_name,
COUNT(student_id) AS student_number
FROM department
LEFT JOIN student ON department.dept_id = student.dept_id
GROUP BY department.dept_name
ORDER BY student_number DESC,
department.dept_name;
Unlike the previous question, here you have two tables: department and student.
The first step is to select the department name and find the number of students by their ID using the COUNT() function.
To get both columns, use LEFT JOIN to join the tables. Why this particular join? Because the question asks you to return all the department names, even if there are no students.
Output is grouped by department and sorted by the student number descendingly and department name ascendingly.
Here’s the expected output.
dept_name | student_number |
Engineering | 2 |
Science | 1 |
Law | 0 |
3. Contest Leaderboard
The final question comes from HackerRank.
“You did such a great job helping Julia with her last coding contest challenge that she wants you to work on this one, too! The total score of a hacker is the sum of their maximum scores for all of the challenges. Write a query to print the hacker_id, name, and total score of the hackers ordered by the descending score. If more than one hacker achieved the same total score, then sort the result by ascending hacker_id. Exclude all hackers with a total score of 0 from your result.”
Here’s the link to the question if you want to follow along with me.
The solution uses two aggregate functions: SUM() and MAX().
SELECT h.hacker_id,
h.name,
SUM(max_score) AS sum_max_score
FROM hackers h
JOIN
(SELECT hacker_id,
challenge_id,
MAX(score) AS max_score
FROM submissions
GROUP BY hacker_id,
challenge_id) s ON h.hacker_id = s.hacker_id
GROUP BY h.name,
h.hacker_id
HAVING SUM(max_score) > 0
ORDER BY SUM(max_score) DESC, h.hacker_id;
Let’s first focus on the subquery, which is used to calculate the highest score by hacker and challenge. To do that, it uses the MAX() aggregate function.
The main query selects the hacker’s ID and name from the table hackers. It also uses the SUM() function to sum the highest scores by referencing the subquery.
These two queries are joined on the column hacker_id using JOIN.
After that, the output is filtered to show only the total scores above zero. Finally, the result is sorted by total score descendingly and by the hacker ID ascendingly.
The first several rows of the output are shown here.
hacker_id | name | sum_max_score |
76971 | Ashley | 760 |
84200 | Susan | 710 |
76615 | Ryan | 700 |
82382 | Sara | 640 |
79034 | Marilyn | 580 |
Summary
There you have it: the five most popular SQL aggregate functions showcased on three medium-level interview questions. It’s not that difficult for a crucial SQL concept, is it?
These three questions are good examples of what you can expect from the job interview. As you’ve seen, the SQL aggregate functions don’t come alone. They are usually assessed simultaneously with other SQL concepts, such as grouping and filtering data, joining tables, and writing subqueries.
Use these questions as guidelines, but make sure you solve several questions on your own. That’s the closes you can get to a job interview before an interview.
Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Connect with him on Twitter: StrataScratch or LinkedIn.