7 Steps to Mastering Machine Learning with Python in 2022
Are you trying to teach yourself machine learning from scratch, but aren’t sure where to start? I will attempt to condense all the resources I’ve used over the years into 7 steps that you can follow to teach yourself machine learning.
Image by Editor
Introduction
Are you trying to teach yourself machine learning from scratch, but aren’t sure where to start? Or maybe you’ve taken an online course or two, but have hit a roadblock in your learning journey and don’t know how to proceed.
I was in a similar position just two years ago. I had spent over $25K in university fees, but was still inexperienced and unprepared for the job market.
It took a lot of trial and error for me to come up with a machine learning roadmap. I watched online courses, YouTube videos, and downloaded countless e-books. The knowledge I gained online surpassed everything I learnt in university. And the best part — it came at a fraction of the cost!
In this article, I will attempt to condense all the resources I’ve used over the years into 7 steps that you can follow to teach yourself machine learning.
Step 1: Learn Programming for Machine Learning
You need to have a working knowledge of programming before you dive into machine learning. Most data scientists use either Python or R to build ML models.
I started out with Python, since it is a general purpose programming language and is higher in demand than R is.
Python skills are also transferrable to different domains, so it would be easier to make the transition if you were to branch out into fields like web development or data analytics in the future.
The 2022 Complete Python Bootcamp course by Jose Portilla is a great introduction to Python if you are new to programming. This course is on Udemy, and they offer promotions often that can bring course prices down to as low as $10. It is a good idea to wait for one of these promotions before making a purchase.
Another advantage of taking this course is that it’s entirely taught using a Jupyter Notebook. This is the most popular Python IDE used by data scientists, and Jose will familiarize you with the interface so you don’t need to spend time learning it on your own.
If you’d like free alternatives to the above course, however, here’s what I suggest:
- Jupyter Notebook Tutorial: Introduction, Setup, and Walkthrough — This course will help you get familiar with Jupyter’s interface.
- Learn Python: Full Course for Beginners [Tutorial] — This course will take you through the basics of Python programming, such as variables, data types, functions, conditional statements, and loops. It is taught using the Pycharm IDE, but you can use a Jupyter Notebook instead.
- Python for Everybody — This is an e-book that you can download for free. This book isn’t like any other Python tutorial you find online. It introduces you to programming concepts through the lens of solving data problems, which makes it an ideal read for data science aspirants.
Once you have a grasp of Python basics, start applying these concepts to solve problems. I never learnt to code properly despite completing a 3 year undergraduate degree in computer science, and that’s because I never applied the concepts I learnt to actual problems.
Due to this, I had a theoretical understanding of how to code, but lacked the ability to break down a problem and code a solution.
A tool that helped improve my problem-solving skills was HackerRank. HackerRank is a platform that provides users with a range of programming challenges with varying levels of difficulty. Try solving at least 2–3 HackerRank problems a day. Start out with the easiest ones, and then increase the level of difficulty as you move on.
If you ever get stuck on a problem, you can always refer to another person’s solution to understand how they solved it. Then, try replicating their thought process with your own code.
As you keep doing these practice problems, you will start to gain confidence in your ability to code.
You can then move on to the next step — learning to work with data in Python.
Step 2: Data Collection and Pre-Processing in Python
Now that you know how to code in Python, you can start to learn data collection and pre-processing.
One thing I’ve noticed about most beginners in the data science industry is that they jump straight into trying to master machine learning. They don’t put much emphasis on collecting or analyzing data, which is a separate skillset on it’s own.
Due to this, they often struggle in the workplace when asked to perform tasks like sourcing for third-party data, or preparing data for machine learning modelling.
Here are some courses I recommend for performing the above tasks. I will also provide free alternatives that you can choose to take instead.
- Data Collection — Many companies require external data collection to support their data science workflows. You can use APIs to collect this data or create web scrapers from scratch, depending on the type of task assigned to you. The Web Scraping and API Fundamentals course by 365datascience will teach you to collect web data in Python. If you’d like a free alternative, then I suggest coding along to the Python API tutorial, followed by the Python web scraping tutorial on Dataquest.
- Data Pre-processing — The data you collect can be present in many different formats. You need to be able to transform this data into a format that can be ingested by machine learning models. This is generally done using a Python library called Pandas, and it is a good idea to gain a strong grasp of this library before you start to learn ML modelling. To start out, you can take this Data Pre-processing with Pandas course offered by 365datascience. If you’d like an alternative to the course above, you can watch a free YouTube video titled Introduction to Data Pre-Processing with Python.
Step 3: Data Analysis in Python
Next, it is a good idea to start learning data analysis with Python. Data analysis is the process of identifying patterns in large amounts of data and discovering insights that add value.
Before creating any machine learning model, you need to understand the data you are dealing with. Look into the relationships between different variables in your dataset. What information does one variable tell you about the other? Are you able to provide recommendations based on the insights you discover within the dataset?
I suggest taking a course titled Learn Python for Data Analysis and Visualization, also by Jose Portilla, to hone your skills in this area.
There are four libraries in Python that are primarily used for data analysis: Pandas, Numpy, Matplotlib, and Seaborn. Jose’s course will teach you to analyze data using all these libraries. The best part about this course is that he includes sample projects that are similar to examples that you will encounter in the real world.
If you’re looking for free alternatives, you can take FreeCodeCamp’s Data Analysis with Python course, or download the Exploratory Data Analysis with Python e-book.
Image from the Scikit-learn documentation
Step 4: Machine Learning with Python
Finally, you can start to learn machine learning! I always suggest using a top-down approach when it comes to learning ML.
Instead of learning the theory and in-depth working of machine learning models, start with an implementation first approach.
Learn to use Python packages to build predictive models first. Run models on real-world datasets and observe the output. Once you get a feel of what machine learning looks like in practice, you can dive deeper into the working of each algorithm.
Python for Data Science and Machine Learning is a great course that you can take to learn the implementation of ML models in Python. Again, this is taught by Jose Portilla, and one of the best introductory machine learning courses I’ve ever taken.
Jose will walk you through the end-to-end machine learning workflow. You will learn to build, train, and evaluate ML models in Python using a library called Scikit-Learn.
Jose will ease you into machine learning concepts without going into overwhelming detail, which makes it a great introductory course for you to start out with.
FreeCodeCamp’s Machine Learning with Scikit-Learn course is a great free alternative to the course above. If you prefer reading, you can download a free e-book titled Building Machine Learning Systems with Python. This is a short, hands-on textbook that will provide you with a ton of practical examples without diving too deep into the working of each algorithm.
Step 5: Machine Learning Algorithms In Depth
Once you get a feel of the different models and how they are implemented, you can start learning the underlying algorithms behind these models.
There are two resources I suggest for this:
- Statistical Learning — edX: This course will provide you with an in-depth understanding of how different machine learning algorithms work. There is less reliance on complex mathematical formulas in this course, which makes it easier to follow if you don’t come from a mathematical background.
This course covers supervised and unsupervised machine learning techniques, such as linear regression, logistic regression, linear discriminant analysis, K-Means clustering, and hierarchical clustering. The instructors also cover concepts like cross-validation and regularization to avoid model overfitting — which will be useful when working with real-world datasets.
Some of the practical lectures in this course is taught in R, which you can feel free to skip, since the main value add of the course is it’s theoretical material.
This course is based on a book written by it’s instructors called An Introduction to Statistical Learning. This is also a resource with code examples written in R. However, I found a GitHub repository that translates all the code examples to Python, so you can read the book and code along to the Python examples instead.
All the resources above can be obtained for free. While edX courses come with a cost, you can apply for financial aid to be exempted from the course fee. You can also download the e-book mentioned above for free.
- Krish Naik’s Machine Learning Playlist — YouTube: Krish Naik is a data scientist who creates machine learning tutorials on YouTube that can be accessed at no cost.
In this playlist, he has videos that take the learner through the mathematical intuition behind different machine learning models. He explains the underlying algorithm behind linear and logistic regression, concepts like bagging and boosting, and unsupervised learning techniques like K-means and hierarchical clustering.
Similar to the Statistical Learning course, he doesn’t explain any of this with complex mathematical notation. Rather, he explains the working of each algorithm in plain English so it can be easily understood by learners from different backgrounds.
Image by geralt on Pixabay
Step 6: Deep Learning
So far, all the resources above have been focused on a traditional machine learning algorithms, or “shallow learning algorithms.” You can now start learning a different class of machine learning algorithms — deep learning.
Deep learning algorithms are able to identify representations in data with little to no feature engineering. Deep learning algorithms are able to identify representations in data and derive features directly from it. Due to this, deep learning is often used to handle data that doesn’t have explicit features — such as image, voice, and text data.
There are two resources I suggest for getting started with deep learning:
- Andrew Ng’s Deep Learning Specialization — Coursera: This is one of the most popular online resources to learn deep learning. Andrew Ng will teach you to build and train neural networks, and apply deep learning techniques to image and text data. Coursera charges a monthly fee when you enroll into a course, and they will provide you with a certificate once you complete it. However, you can choose to audit this course and get all the course material for free.
- Deep Learning with Python — This is my favourite deep learning resource out there. This textbook will take you through the theory and implementation of deep learning models. Again, the author of this book assumes that the reader doesn’t come from a mathematical background, and all the concepts are explained in plain English. I preferred this book to Andrew Ng’s deep learning course, since more real world examples and Python code was provided. I was able to apply what I learnt to real-life projects, as compared to Andrew Ng’s course which was highly theoretical.
Step 7: Projects
The final step: Build projects!
There is a lot of material provided above. If you don’t apply any of it to real-life projects, you will forget what you learnt. You can memorize concepts, collect certifications, and sit for as many exams as you’d like. But you only really learn when you start to build.
Here is an article that has a compilation of machine learning projects created by other data scientists, with source code provided for your reference. You can code along to some of these projects and make minor changes to them, before starting your own project from scratch.
Here are a few more resources I found that can help you get started:
- Scraping Amazon book reviews
- Deep learning projects with source code
- Krish Naik’s machine learning project playlist
- Building an age detection model from a picture of their face
Teaching yourself machine learning can be time-consuming and overwhelming. However, it is also a very rewarding journey. Every time you learn a new concept or solve a problem you didn’t think was possible, you are one step closer to achieving your goal of machine learning proficiency.
Natassha Selvaraj is a self-taught data scientist with a passion for writing. You can connect with her on LinkedIn.