9 Free Harvard Courses to Learn Data Science
Learn Python programming, statistics, and machine learning online from one of the world’s top universities.
Photo by Danilo Rios on Unsplash
Last month, I wrote an article on building a data science learning roadmap with free courses offered by MIT.
However, the focus of most courses I listed was highly theoretical, and there was a lot of emphasis on learning the math and statistics behind machine learning algorithms.
While the MIT roadmap will help you understand the principles behind predictive modelling, what’s lacking is the ability to actually implement the concepts learnt and execute a real-world data science project.
After spending some time scouring the Internet, I found a couple of freely available courses by Harvard that covered the entire data science workflow?—?from programming to data analysis, statistics, and machine learning.
Once you complete all the courses in this learning path, you are also given a capstone project that allows you to put everything you learnt in practice.
In this article, I will list 9 free Harvard courses that you can take to learn data science from scratch. Feel free to skip any of these courses if you already possess knowledge of that subject.
Step 1: Programming
The first step you should take when learning data science is to learn to code. You can choose to do this with your choice of programming language?—?ideally Python or R.
If you’d like to learn R, Harvard offers an introductory R course created specifically for data science learners, called Data Science: R Basics.
This program will take you through R concepts like variables, data types, vector arithmetic, and indexing. You will also learn to wrangle data with libraries like dplyr and create plots to visualize data.
If you prefer Python, you can choose to take CS50’s Introduction to Programming with Python offered for free by Harvard. In this course, you will learn concepts like functions, arguments, variables, data types, conditional statements, loops, objects, methods, and more.
Both programs above are self-paced. However, the Python course is more detailed than the R program, and requires a longer time commitment to complete. Also, the rest of the courses in this roadmap are taught in R, so it might be worth learning R to be able to follow along easily.
Step 2: Data Visualization
Visualization is one of the most powerful techniques with which you can translate your findings in data to another person.
With Harvard’s Data Visualization program, you will learn to build visualizations using the ggplot2 library in R, along with the principles of communicating data-driven insights.
Step 3: Probability
In this course, you will learn essential probability concepts that are fundamental to conducting statistical tests on data. The topics taught include random variables, independence, Monte Carlo simulations, expected values, standard errors, and the Central Limit Theorem.
The concepts above will be introduced with the help of a case study, which means that you will be able to apply everything you learned to an actual real-world dataset.
Step 4: Statistics
After learning probability, you can take this course to learn the fundamentals of statistical inference and modelling.
This program will teach you to define population estimates and margin of errors, introduce you to Bayesian statistics, and provide you with the fundamentals of predictive modeling.
Step 5: Productivity Tools (Optional)
I’ve included this project management course as optional since it isn’t directly related to learning data science. Rather, you will be taught to use Unix/Linux for file management, Github, version control, and creating reports in R.
The ability to do the above will save you a lot of time and help you better manage end-to-end data science projects.
Step 6: Data Pre-Processing
The next course in this list is called Data Wrangling, and will teach you to prepare data and convert it into a format that is easily digestible by machine learning models.
You will learn to import data into R, tidy data, process string data, parse HTML, work with date-time objects, and mine text.
As a data scientist, you often need to extract data that is publicly available on the Internet in the form of a PDF document, HTML webpage, or a Tweet. You will not always be presented with clean, formatted data in a CSV file or Excel sheet.
By the end of this course, you will learn to wrangle and clean data to come up with critical insights from it.
Step 7: Linear Regression
Linear regression is a machine learning technique that is used to model a linear relationship between two or more variables. It can also be used to identify and adjust the effect of confounding variables.
This course will teach you the theory behind linear regression models, how to examine the relationship between two variables, and how confounding variables can be detected and removed before building a machine learning algorithm.
Step 8: Machine Learning
Finally, the course you’ve probably been waiting for! Harvard’s machine learning program will teach you the basics of machine learning, techniques to mitigate overfitting, supervised and unsupervised modelling approaches, and recommendation systems.
Step 9: Capstone Project
After completing all the above courses, you can take Harvard’s data science capstone project, where your skills in data visualization, probability, statistics, data wrangling, data organization, regression, and machine learning will be assessed.
With this final project, you will get the opportunity to put together all the knowledge learnt from the above courses and gain the ability to complete a hands-on data science project from scratch.
Note: All the courses above are available on an online learning platform from edX and can be audited for free. If you want a course certificate, however, you will have to pay for one.
Natassha Selvaraj is a self-taught data scientist with a passion for writing. You can connect with her on LinkedIn.