The Ultimate Guide to Acing Coding Interviews for Data Scientists
This article covers understanding the 4 types of coding interview questions and preparing for them effectively.
By Emma Ding, Data Scientist & Software Engineer at Airbnb & Rob Wang, Senior Data Scientist at Robinhood
Photo by Christopher Gower on Unsplash
Introduction
Data science (DS) is a relatively new profession compared to other types of roles in the tech industry, such as software engineering and product management. Initially, DS interviews had a limited coding component, including only SQL or applied data manipulation sessions using Python or R. In recent years, however, DS interviews have shown an increased emphasis on computer science (CS) fundamentals (data structures, algorithms, and programming best practices).
For someone looking to enter the data science profession, this trend towards more CS in interviews can be daunting. In this post, we hope to increase your understanding of the coding interview and teach you how to prepare for it. We will categorize different coding questions and provide tips to crack them so that you can have a stellar interview. You can reach out to us here if you think we might be able to make your journey easier in any way!
Why are Coding Questions Asked in DS Interviews?
What exactly is a coding interview? We use the phrase “coding interview” to refer to any technical session that involves coding in any programming language other than a query language like SQL. In today’s market, you can expect a coding interview with just about any data science job.
Why? Coding is an essential part of your DS careers. Here are three reasons:
- DS is a technical subject. The bulk of a data science job involves collecting, cleaning, and processing data into usable formats. Therefore, to get work done, basic programming proficiency is a must.
- Lots of real-world data science projects are highly collaborative, involving multiple stakeholders. Data scientists who are equipped with stronger fundamental CS skills will find it easier to work closely with engineers and other partners.
- In many companies, data scientists are responsible for shipping production code, such as data pipelines and machine learning models. Strong programming skills are essential for projects of this type.
To sum up, strong coding skills are necessary to perform well in many data science positions. If you cannot show that you possess those skills in the coding interview, you will not get the job.
Roles That are Likely to Have Coding Interviews
Of course, the level of coding required does differ depending on the position. Check this YouTube video if you’re interested in learning the differences between various DS roles. If you are looking for a data scientist role that falls into any of the categories below, the chances of encountering a coding interview are very high:
- Data scientist roles with a heavy machine learning (ML) or modeling emphasis: For these kinds of roles, candidates are expected to work independently or closely with engineers to productionize machine learning, statistical, or optimization models. Such roles, though titled “data scientist”, are more similar to “machine learning engineering” or “research scientist” roles. A few examples of such jobs are Core Data Science at Facebook, Data Scientist — Algorithms at Airbnb and Lyft, etc.
- Companies in which data scientists are part of an engineering org: For such positions, there is a general expectation that every data scientist possesses sufficient programming proficiency. Robinhood’s data scientist position is an example of this kind of role.
- Data scientist roles at small to medium-sized tech companies: The environment in such companies tends to be fast-paced, and data scientists may wear multiple hats. In particular, they are required to demonstrate full-stack skills to get things done quickly and efficiently.
In contrast, if you are interviewing for a DS role with a Product Analytics emphasis, there is a lower likelihood of encountering coding questions. Interviews for these roles do not often go beyond evaluating SQL proficiency, but general programming may still be tested from time to time. Candidates who do not possess a basic level of coding knowledge can be easily caught off guard during the interview and may fail to move forward in the process. Do not let that be you! Make sure you are prepared. You can start your preparation by learning what to expect with a coding interview.
When to Expect a Coding Interview?
A coding interview can appear during the technical phone screen (TPS), onsite, or both. There could even be multiple rounds of coding interviews during the onsite portion, depending on the coding proficiency expected. In general, you should expect coding interviews in at least one stage of an overall DS interview loop.
During the TPS, the delivery of the coding interview will typically be through online integrated development environments (IDEs) such as CoderPad, HackerRank, and CodeSignal. During onsite sessions, either an online IDE or a whiteboard can be used. In the current remote interview environment, the former is used by default.
The length of a coding session ranges from 45 minutes to 1 hour and it usually involves one or more questions. The choice of language is typically flexible, but most candidates will choose Python for its simplicity.
Different Categories of Coding Interviews
Based on our experiences interviewing with dozens of large and medium-sized companies, such as Airbnb, Amazon, Facebook, Intuit, Lyft, Robinhood, Slack, Snapchat, Square, Stitch Fix, Twitter, Upstart, and more, we have categorized coding questions into the following four types.
Basic Data Structures
This type of question aims at evaluating candidates’ proficiency in introductory CS fundamentals. These fundamental topics can include, but are not limited to:
- Data Structures: Arrays, Hashmaps/Dictionary, Heaps, Sets, Stack/Queues, Strings, and Tree/Binary Tree.
- Algorithms: Binary Search, Recursion, Sorting, and Dynamic Programming.
Some additional topics such as Linked Lists and Graphs (Depth First Search or Breadth-First Search) are less likely to occur during this type of interview.
Typically, multiple questions will be asked about a single scenario, ranging from simple to hard. Each question may cover a unique data structure or algorithm. Here is an example of a classic problem that revolves around finding the median of a list of numbers:
- Part 1: Find the median using any method. Candidates can use a built-in sorting function and simply return the median after sorting.
- Part 2: The interviewer now asks for a more optimized version of finding the median. In this setting, knowledge of common algorithms, such as quickselect, will come in handy.
- Part 3: Finally, the question is changed to a “streaming” version of computing medians, meaning that the data comes in an online fashion rather than as a fixed list of numbers. In this case, the candidate would likely resort to the use of heaps (slightly more challenging).
This type of question may also appear as an applied business problem. For such questions, the candidate is expected to code up a solution to a hypothetical applied problem, which is usually related to the company’s business model. These questions are easy to medium in the level of difficulty (based on the categorization of Leetcode). The key here is to understand the business scenario and exact requirements before coding.
Mathematics and Statistics
These questions will require undergraduate-level mathematics and statistics knowledge in addition to coding capability. A few most commonly asked concepts include:
- Simulation: Monte Carlo simulations, weighted sampling, simulating Markov chains, etc.
- Prime Numbers / Divisibility: Calculations involving divisibility of natural numbers, Euclidean algorithm for computing the greatest common divisor of two natural numbers, etc.
Some common questions include:
- Estimating the value of Pi using simulation.
- Enumerating all prime numbers up to a given natural number N.
- Simulating a multinomial distribution using uniform random numbers.
Machine Learning Algorithms
Photo by Hitesh Choudhary on Unsplash
This type of question involves coding up a basic ML algorithm from scratch. Besides general coding ability, the interviewer will also be able to evaluate candidates’ applied machine learning knowledge. You will need to be familiar with common families of machine learning models to answer these questions.
Here is a list of the most common model families that appear frequently during coding interviews:
- Supervised Learning: Decision Tree, Linear and Logistic Regression (using stochastic gradient descent), and K-nearest Neighbors.
- Unsupervised Learning: K-means Clustering.
Other model families, such as Support Vector Machines, Gradient Boosting Trees, and Naive Bayes are less likely to occur. You also are not likely to be required to code up a deep learning algorithm from scratch.
Data Manipulation
This type of question is not as common as the other types. They ask candidates to carry out data processing and transformations without using SQL or any data analysis library such as pandas. Instead, candidates are only allowed to use a programming language of choice to solve the problems. Some common examples include:
- Representing two datasets as dictionaries, and joining them together on some given key values.
- Given a dictionary of dictionaries representing a JSON blob, doing some basic parsing to extract particular entries.
- Writing a function that is similar to the “spread” or “gather” functions in R’s tidyr package, and testing it using a dataset.
- Calculating a 30-day rolling profit.
- Parsing event logs and returning the count of unique strings by day/month/year.
Knowing that you can expect these four types of questions will help you to prepare systematically. In the next section, we will share some tips on how exactly to do that.
How to Prepare?
This list of types of questions may appear daunting at the first glance, but don’t be discouraged or overwhelmed! If you have a good grasp of basic CS knowledge and machine learning algorithms, and you take the time to prepare (which we will show you how to do in this section), then you will be able to ace the coding interview.
To prepare for different categories of coding questions, we recommend the following strategies:
Brush up on the Basics:
For each of the four major question themes outlined above, begin by reviewing the fundamentals. These descriptions can be found in various online sources as well as books. Specifically:
- Data Structures (though typically with software engineers as the intended audience): Cracking the Coding Interview by Gayle Laakmann McDowell and The Best Python Books
- Machine Learning: Introducing the Facebook Field Guide to Machine Learning Video Series and The Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie.
- Mathematics and Statistics: brilliant.org — one of the recommended materials in Facebook’s onsite interview prep guide.
Categorize Questions:
Once you feel relatively at home with the basics, expand the scope of your review to include a larger set of commonly encountered problems. You can find these on Leetcode, GeeksForGeeks, and GlassDoor. You can save the problem statements in an organized manner, ideally grouped by theme using tools such as Notion or Jupyter notebooks. For each of the topics, practice a lot of easy questions and a few medium ones. Taking the time to create a categorized collection of coding problems will not only benefit your current job search, but it will also prove helpful for future job searches.
Compare Multiple Solutions:
Relying on rote memorization will not be sufficient for acing the interview. To achieve a more comprehensive understanding, we recommend coming up with multiple solutions to the same problem and comparing the strengths and weaknesses (e.g. run-time/storage complexities) of the different approaches.
Explain to Others:
To reinforce understanding, explain your solutions/approaches to a non-technical person using plain English. A higher-level understanding of the common problem approaches often has greater value than detailed implementation and can be especially helpful for adapting existing knowledge to new and unfamiliar settings.
Mock Interviews:
Work with a peer to do a mock interview, or conduct it by yourself. You can use an online coding platform, such as Leetcode, to solve real interview questions in a limited time window.
Employ these preparation techniques, and you will go into your interview not only with more knowledge but also with more confidence!
How You Are Evaluated?
There are 4 major qualities you want to convey during your interview.
Logical Reasoning:
The interviewer wishes to see candidates make logical connections between the information provided and the ultimate answer. You should therefore describe clearly what is needed for the computation and how you would write the code to solve the problem, before diving into the actual coding.
Communication:
The effectiveness of your communication matters significantly. Before coding, clearly communicate your thought process. If the interviewer asks questions at any point during the interview, you need to be able to explain the reasoning of your assumptions and choices.
Code Quality and Best Practices:
The interviewer will also evaluate your overall code quality. While the standard expectations in a DS interview would not be as high as those in a software engineering interview, candidates should still focus on several aspects:
- Whether the code is executable without any syntax error.
- Cleanliness and conciseness.
- Whether the solution is optimized in terms of run-time/storage efficiency.
- General coding best practice, e.g. modularity, handling of edge cases, naming conventions, etc.
Proficiency:
Just as with software engineering coding interviews, for DS coding interviews, it is reasonable to expect multi-part questions and sometimes multiple questions. In other words, speed is also important. Being able to solve more questions within a limited amount of time is a signal of overall proficiency.
Tips to Ace Coding Interviews
Before the interview, it is worth clarifying with recruiters what kinds of coding questions will be asked, as well as the approximate difficulty level. Lots of data science interviews do not require heavy programming, but that does not mean interviewers will not expect basic coding proficiency at your fingertips. Always ask your recruiter what to expect. If you make incorrect assumptions on the types of questions that can appear during interviews, you may end up preparing inadequately.
During the interview, use these tips to answer coding questions effectively.
- Before diving into coding: Clarify the question and its underlying assumptions. Communication is key. A candidate who needed some help along the way but communicated clearly can be even better than a candidate who breezed through the question. Also, explain the overall approach to the interviewer before you begin the actual implementation.
- When writing code: Start with a naive brute force solution, and optimize it later. Think out loud. Say what you think might (or might not) work. You may soon realize something does work, or a modified version of it does. When stuck for more than several minutes on a particular part, it is okay to ask the interviewer for a moderate hint.
- After the coding is done: If test cases are not provided, you should propose several normal cases and edge cases. Walk through your solution out loud with an example input. This will help you find bugs and clear up any confusion that your interviewer might have about what you are doing.
Final Thoughts
Coding interviews, like other technical interviews, require systematic and effective preparation. Hopefully, our article has given you some insights into both what to expect in a coding interview for DS related positions and how to prepare for them. Remember: Enhancing your coding skills will be extremely rewarding not only for landing your dream job, but also for excelling in the job!
Thanks for reading!
- Upvote below if you learned something new in this post! It will motivate us to write more to help more people!
- Connect with Emma and Rob on LinkedIn!
- Subscribe to Emma’s YouTube Channel!
Emma Ding is a Data Scientist & Software Engineer at Airbnb.
Rob Wang is a Senior Data Scientist at Robinhood.
Original. Reposted with permission.
Related:
- How I Got 4 Data Science Offers and Doubled my Income 2 Months After Being Laid Off
- How to Get Data Science Interviews: Finding Jobs, Reaching Gatekeepers, and Getting Referrals
- 10 Statistical Concepts You Should Know For Data Science Interviews