Data Science Projects That Will Land You The Job in 2022
Project ideas and portfolio tips from a self-taught data scientist.
Photo by National Cancer Institute on Unsplash
As mentioned countless times by people working in the data industry, building data science projects is one of the easiest ways to land a job in the field, especially if you don’t possess a Master’s degree.
In a previous article I wrote in 2020, I provided a list of data science project ideas that you can add to your portfolio to increase your chances of landing a job. It’s been two years since then.Â
After working in the industry for some time, speaking to data science hiring managers, and reviewing resumes, I have a slightly different perspective on the types of projects you should add to your resume.
In this article, I will provide you with advice on the kinds of projects you should and should NOT add to your portfolio. Some projects actually do more harm than good to your resume, and showcasing them makes it likely for your application to get tossed out.Â
I will also provide you with project ideas with tutorials and source code as reference material. Make sure you don’t copy these projects verbatim?—?change the codes slightly and add your own unique spin to it, and most importantly, make sure you understand what you are doing. Data science interviewers almost always ask you to explain past projects and source code, and it will be impossible to get through the interview stage if you don’t understand your own work.
What kind of projects should you add to your portfolio, and what should you avoid?
DO: Create projects that solve a problem with the help of data
Too often, I’ve seen people build data science projects with no regard for the end objective.Â
I once went through a candidate’s GitHub profile and saw hundreds of lines of EDA and colourful charts. It looked good, but I had no idea what his objective was. Honestly, I don’t think he knew either, as he was unable to explain exactly what he was trying to achieve with the analysis.Â
This candidate made a common mistake?—?He focused too much on trying to showcase that he knew how to build different types of charts, pre-process, and manipulate data. There was too much emphasis on codes, libraries, and tools, instead of focusing on actually deriving an outcome from his analysis.
Most hiring managers don’t really care about the libraries and tools you use. They want to understand your thought process, and why you decided to take the approach you did to solve the problem. They want to know how you struggled in deriving the outcome along the way, and what methods you applied to overcome it.
If you are able to walk them through an entire data science project and explain what you did and why you did it, you give employers confidence in your reasoning skills. This tells them that regardless of the program or library you use, you will be able to solve any problems that arise.
DON’T: create projects that are overused
Titanic survival prediction, Boston house pricing prediction, and Iris flower classification are examples of projects that you should not display on your portfolio.
While these projects are a great way for you to enhance your own machine learning skills, they are just to easy and common to showcase on your resume. These are extremely popular data science projects, and it is likely that other candidates would have them on their applications as well.
If a hiring manager reviews 50 applications for a single job listing and you showcase the same projects as 40 other candidates, it is likely that your resume will get tossed out as you don’t stand out from the crowd.
And since these projects are relatively simple, it will seem to the employer as though your data science knowledge is shallow and at a surface level.
Data Science Projects to Build Your Portfolio
Here are a few examples of projects that will strengthen your resume and help you stand out amongst other candidates interviewing for the same job:
1. Churn Prediction
Customer churn is the rate at which a consumer stops doing business with an organization. A high customer churn rate is bad for companies, and if they are able to predict that a user is about to stop making purchases from them, they will implement measures to prevent this from happening.
This is one of the most popular applications of data science in organizations.Â
If you showcase a customer churn prediction model on your resume, you will capture the attention of hiring managers, as it is a use-case they are familiar with. This helps you stand out amongst other candidates who might be showcasing projects that have no business relevance whatsoever.
Here is a tutorial I found online that you can code along to, in order to learn how to predict customer churn. Once you understand the codes and are familiar with how a churn prediction model works, I suggest building your own algorithm from scratch. Do not copy another person’s code without properly understanding it, as you will find it difficult to pass the interview or answer questions related to the project.
2. Customer Segmentation
Customer segmentation is the process of dividing a company’s target audience into different user groups. Each group will have patterns that are similar to each other.
I recommend creating a project like this because it is also a common data science use-case in organizations, and will add business value to a company.
You can code along to a customer segmentation tutorial I created using Kaggle’s Mall Customer Segmentation dataset. I performed some exploratory data analysis, pre-processing, and finally built a model to segment customers of a shopping mall into different user groups. I also provided recommendations as to how these individuals should be approached, and the type of products they would be interested in.
3. Taxi Trip Duration Prediction
Ride-sharing companies like Uber often need to predict the arrival time of a cab and how long it would take for the vehicle to reach its destination.
This is an application of machine learning that adds value to organizations, and will look good on your resume. You can use the NYC Taxi Trip Duration dataset on Kaggle for this project.
There are a few existing solutions implemented to predict taxi trip duration with this dataset, and you can use this code notebook as a reference point when building your model.
Additional Tips on Landing a Data Science Job
1. Always prioritize quality over quantity
It is better to have 2–3 complete, end-to-end projects on your resume instead of having 10 simple, common projects. A hiring manager will generally ask you to walk him through one complete data science project and how you implemented it. Make sure this isn’t as simple as the Titanic or Iris Flower prediction.
2. Include different types of projects on your resume
The main focus of the projects I listed above is machine learning. Make sure to also showcase projects that involve elements of data collection, pre-processing, and EDA, as many data science roles require a diverse skillset that goes beyond just model building.Â
3. Be creative and original
Hiring managers scan through hundreds of applications everyday. In order to really stand out to them, you need to be creative. Showcase projects that tell a story?—?something that they likely wouldn’t have come across before. Here are some examples of portfolio projects I built that were unique and helped me land my first data science job.
Attempting to break into an entirely new field can be overwhelming, especially if you don’t have any prior formal training in the subject. However, with the abundance of online courses, tutorials, and source code published on the Internet, you can learn almost everything you need to know about the subject to land a data science job.
Natassha Selvaraj is a self-taught data scientist with a passion for writing. You can connect with her on LinkedIn.