10 Cheat Sheets You Need To Ace Data Science Interview
The only cheat you need for a job interview and data professional life. It includes SQL, web scraping, statistics, data wrangling and visualization, business intelligence, machine learning, deep learning, NLP, and super cheat sheets.
Image by AuthorÂ
The list of 10 cheat sheets is for beginners, students, job seekers, and professionals. These are my favorite, and they are hand-picked so that you don’t have to search for the best cheat sheet for every subcategory of data science.Â
The cheat sheets are life savers. It has helped me multiple times when I was preparing for data science and machine learning interviews. It just took me 30 minutes to review all of the old but necessary concepts and prepare for any technocal question.Â
The list of cheat sheets covers:
- SQL
- Web Scraping
- StatisticsÂ
- Data Wrangling
- Data Visualization
- Business Intelligence
- Machine Learning
- Deep Learning
- Natural Language Processing
- Super Cheat Sheets.
Note: Some of the cheat sheets are downloadable PDFs, some are HTML based, and some are written in blog style.Â
SQL
Cheat sheet sample from Dataquest
SQL by Dataquest is a blog style cheat sheet. It will give you an overview of SQL basic queries.Â
- Fundamentals: selecting rows and columns, comments, and limits
- Joins:Â inner, left, right, and outer joins
- Complex Queries: subqueries, string match, Case, With clause, creating and dropping views, Union, Intersect, and chaining
As a data scientist, you must be aware of these functions and commands to pass the SQL coding interview session. Even after that, it will be a major part of your work life. Extracting specific data, creating pipelines, processing the data, and creating analytics all using SQL commands and complex queries.Â
Web Scraping
Image by Frank Andrade
Web Scraping by Frank Andrade is a blog-based cheat sheet that covers all the basics of web scraping and how you can use it to create automated web crawlers. For a data professional having web scraping skills is a plus point. It will help them gather data from HTML-based websites and APIs.
You will learn about:Â
- HTML for Web Scraping
- Beautiful Soup
- XPath
- Selenium
- Scrapy
- Python Basics for Web Scraping
The cheat sheet contains easy-to-follow code examples with visual aid. You can learn functions of various web scraping Python libraries and automate your workflow.Â
Statistics
Cheat sheet example from stanford.edu
Statistics by stanford.edu is an HTML-based cheat sheet. It covers all of the statistics concepts with mathematical formulas and visual examples where possible. Â
It is divided into 5 core parts:
- Parameter estimation
- Confidence intervals
- Hypothesis testing
- Regression analysis
- Correlation analysis
During the technical work presentation, you have to back your claim with the statistical terminologies. Reading the cheat sheet for 5 minutes will help you remember core terminologies and formulas.Â
Pandas Data Wrangling
Cheat sheet example from DataCamp
Pandas Data wrangling by DataCamp is a PDF-based one-page cheat sheet. It consists of various data wrangling techniques with code and visual examples.Â
- Reshaping the data: pivot, pivot table, stack and unstack, and melt.
- Iterations
- Handlining missing data
- Advance indexing: reindexing, setting and unsetting index, and multilevel index.Â
- Duplicating data
- Grouping data
- Combining table: merging, joining, and concatenating
- Dates
- Visualization
It is a great resource to revise all of the core functions of the pandas library.
Data Visualization
Image from DataCamp
Data Visualization by DataCamps is the best cheat sheet for understanding data visualizing and when to use them. It is a hybrid (Blog + PDFs) cheat sheet that covers all of the basic concepts of data visualization.
You will learn:Â
- How to Capture a Trend
- How to Visualize Relationships
- Part-to-whole Charts
- How to Visualize a Single Value
- How to Capture Distributions
- Visualize a flow
You can read all of the core concepts as a blog or download the PDF file. You will be amazed how it is necessary for the chart selection. Â
Tableau Business Intelligence
Cheat sheet example from learnovita.com
Tableau by learnovita.com is a blog-based cheat sheet. It covers all of the basic functions, data types, visualization types, and commands.
It consists of:
- Data source
- Data Extract
- Data Joining
- Data Blending
- Operators
- LOD Expressions
- Sorting
- Filters
- Charts
Tableau is the most famous tool for Business Intelligence. It will help you perform data analytics, visualization, and wrangling with a few clicks. Furthermore, you can create stories and a dashboard within a few minutes. There is a high demand for it in data analytics and data science-related jobs.Â
"To get most of these cheat sheets, I will suggest you bookmark this page and review all the cheat sheets. It will just take you 30 minutes to go through all of the APIs, commands, and technical terms."
Machine learning
Cheat sheet example from DataCamp
Machine learning with Scikit-Learn by DataCamp is a PDF-based cheat sheet that will help you revise all of the functions and commands for data processing and modeling.Â
You will learn Scikit-Learn’s API:
- Data loading
- Preprocessing
- Data splitting
- Building model
- Model training
- Predicting
- Model Evaluation
- Model TuningÂ
This cheat sheet is quite handy for coding exams, technical interviews, or just reviewing commands to run simple machine learning tasks.
Deep Learning
Cheat sheet example from DataCamp
Deep Learning with Keras by DataCamp is PDF based cheat sheet that can be used to review all of the various Keras functions from data preprocessing and neural networks.Â
It will help you with:
- Loading default dataset
- Pre-processing
- Neural network model architecture
- Prediction
- Model inspection
- Model compiling
- Model training and evaluation
- Model saving and loading
- Fine-tuning
It is a code-based cheat sheet, and it assumes that you understand the basics of building and training neural networks. In just one look you will understand various functions that will help you during coding interviews and take-home assignments.Â
Natural Language ProcessingÂ
Cheat sheet example from janlukasschroeder
NLP by janlukasschroeder is one of a kind cheat sheet on Natural Language Processing (NLP). It is a GitHub-based cheat sheet where everything is created using Markdown in the README.md file.Â
You will learn about:
- Word embeddings
- Stop Words
- Spans
- Tokenization
- Chunks and Chunking
- Part-of-speech (POS) Tagging
- BILUO tagging
- Stemming
- Lemmatization
- Sentence Detection
- Dependency Parsing
- Named Entity Recognition (NER)
- Text Classification
- Similarity
- N-grams
- Visualization
- Kernels
- Text Summarization
- Sentiment Analysis
- Levenshtein distance
- Markov Decision Process
- Probability to discard words to reduce noise
It has everything you want to learn about basics of NLP and language-based applications. You will also learn various NN architecture, loss functions, optimizers, and regulators. If you like the cheat sheet, give it a star.Â
Super Cheat Sheet
Cheat sheet example from GitHub
Super Data Science by Maverick Lin is a PDF-based multi-page cheat sheet and my favorite. It covers all the topics on algorithms to SQL. The cheat sheet is purely theoretical with math and visual aid.Â
It consists of various categories:
- Probability
- Statistics
- Types of Data
- Data Cleaning
- Feature Engineering
- Statical analysis
- Distributions
- Modeling Evaluation Metrics
- Linear Regression
- Distance methods
- Nearest Neighbor Classification
- Clustering
- Machine Learning
- Deep Learning
- Big Data
- Graph Theory
- SQL
If you are lazy like me, I think you will like to just review it in one go and become confident about the interview. I am not saying that you should ignore all of the above. All ten are necessary for you to succeed in any data science, data analytics, or machine learning interview stage. Especially the HTML and blog post based.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.