7 Steps to Understanding Computer Vision
A starting point for Computer Vision and how to get going deeper. Dive into this post for some overview of the right resources and a little bit of advice.
By Pulkit Khandelwal, VIT University.
If We Want Machines to Think, We Need to Teach Them to See.
-Fei Fei Li, Director of Stanford AI Lab and Stanford Vision Lab
Learning and computation provides machine the ability to better understand the context of images and build visual systems which truly understand intelligence. The huge amount of image and video content urges the scientific community to make sense and identify patterns amongst it to reveal details which we aren’t aware of. Computer Vision generates mathematical models from images; Computer Graphics draws in images from models and lastly image processing takes image as an input and gives an image at the output.
Computer Vision is an overlapping field drawing on concepts from areas such as artificial intelligence, digital image processing, machine learning, deep learning, pattern recognition, probabilistic graphical models, scientific computing and a lot of mathematics. So, take this post as a starting point to dwell into this field. I will try to cover as much as possible in this post but still there will be a lot of advanced topics and some cools things which might be left out (maybe for later posts?).
Step 1 - Background Check
As usual get the basics right with an undergraduate course in probability, statistics, linear algebra, calculus (both: differential and integral). A brief introduction to matrix calculus should come in handy. Also, my experience says that if one has some idea of digital signal processing then it should be helpful to grasp concepts easily.
On the implementation side, I prefer one to have a background in both MATLAB and Python. Check sentdex (a YouTube channel) for everything you need for scientific programming in Python. Do keep in mind that Computer Vision is all about computational programming.
You might want to have a look to Probabilistic Graphical Models (though it is a very advanced subject). You can always return to it later.
Step 2 - Digital Image Processing
Watch the videos by Prof. Guillermo Sapiro of Duke University. The syllabus is very self contained and comes in with lot of exercises. You can find videos on Youtube or wait for the next session on Coursera starting September 2016.
Refer to the book Digital Image Processing by Gonzalez and Woods. Go through the examples of the concepts as taught by this course on MATLAB.
Step 3 - Computer Vision
Once done with Digital Image Processing the next step is to understand the mathematical models underlying the formulations of variety of applications of image and video content. University of Florida’s Prof. Mubarak Shah’s course on Computer Vision act as good introductory course covering all the fundamental concepts required to build on advanced material.
Watch these videos and alongside implementing the learned concepts and algorithms by following GaTech Prof. James Hays’ projects of his Computer Vision class. These assignments are also on MATLAB. Do not skip these. You only get the deep understanding of the algorithms and equations once you implement them from scratch.
Step 4 - Advanced Computer Vision
Following the first three steps will now make you get going for the advanced material.
Coursera’s offering Discrete Inference in Artificial Vision gives you a probabilistic graphical models and mathematical overdose of Computer Vision. Although Coursera has removed this content from the website, you should be able to find that somewhere on the internet. Things now seem to look interesting and will definitely give you a feel of how complex yet simple models are built for machine vision systems. This course should also be a stepping stone to get going with academic papers.
Step 5 - Bring in Python and Open Source
Let’s get into Python.
There are many packages such as OpenCV, PIL, vlfeat and the likes. Now is the right time to use packages built by others into your projects. No need to implement everything from scratch.
You can find many good blogs and videos to get started with Programming Computer Vision with Python. I would recommend this book; it should be more than enough. Go and have fun! See how MATLAB and Python get you to implement algorithms.
Step 6 - Machine Learning and ConvNets
There are just too many posts on getting started with machine learning.
From now on you are better off sticking with Python. Have a quick go through Building Machine Learning Systems with Python and Python Machine Learning.
With all the deep learning hype around, you now enter into the current research work in Computer Vision: the use of ConvNets. Stanford’s CS231n: Convolutional Neural Networks for Visual Recognition is a comprehensive course on this. Although videos have been taken down from the official website, you can very easily find re-uploads on Youtube.
Step 7 - How should I explore more?
You might think that I have already overloaded you with so much of information. But, there is lot of stuff to explore.
One good approach should be to have a look at some of the graduate seminar courses by Sanja Fidler of University of Toronto and James Hays to get an idea of current research directions in Computer Vision through rich academic papers.
Another possible approach is to follow top papers from top conferences such as CVPR, ICCV, ECCV, BMVC. Alternatively you can follow blogs such as pyimagesearch.com or computervisionblog.com or aishack.in. Watch endless talks and lectures on Computer Vision and related fields at videolectures.net!
In a nutshell you have covered the history of computer vision right from filters, feature detectors and descriptors, camera models, trackers to tasks such as recognition, segmentation and the most recent advancements in neural nets and deep learning. In the next post I will give a list of top blogs to follow and in the subsequent post I will write about the top papers of all time to read related to Computer Vision.
Bio: Pulkit Khandelwal is an incoming Computer Science Master’s student at McGill University. His interests lie in Computer Vision and Machine Learning.
Related: