What is Transfer Learning?
During transfer learning, the knowledge leveraged and rapid progress from a source task is used to improve the learning and development to a new target task. Read on for a deeper dive on the subject.
Image by qimono on Pixabary
Transfer Learning is a machine learning method where the application of knowledge obtained from a model used in one task, can be reused as a foundation point for another task.
Machine learning algorithms use historical data as their input to make predictions and produce new output values. They are typically designed to conduct isolated tasks. A source task is a task from which knowledge is transferred to a target task. A target task is when improved learning occurs due to the transfer of knowledge from a source task.
During transfer learning, the knowledge leveraged and rapid progress from a source task is used to improve the learning and development to a new target task. The application of knowledge is using the source task’s attributes and characteristics, which will be applied and mapped onto the target task.
However, if the transfer method results in a decrease in the performance of the new target task, it is called a negative transfer. One of the major challenges when working with transfer learning methods is being able to provide and ensure the positive transfer between related tasks, whilst avoiding the negative transfer between less related tasks.
The What, When, and How of Transfer Learning
- What do we transfer? To understand which parts of the learned knowledge to transfer, we need to figure out which portions of knowledge best reflect both the source and target. Overall, improving the performance and accuracy of the target task.
- When do we transfer? Understanding when to transfer is important, as we don’t want to be transferring knowledge which could, in turn, make matters worse, leading to negative transfer. Our goal is to improve the performance of the target task, not make it worse.
- How do we transfer? Now we have a better idea of what we want to transfer and when we can then move on to working with different techniques to transfer the knowledge efficiently. We will speak more about this later on in the article.
Before we dive into the methodology behind transfer learning, it is good to know the different forms of transfer learning. We will go through three different types of transfer learning scenarios, based on relationships between the source task and target task. Below is an overview of the different types of transfer learning:
Different Types of Transfer Learning
Inductive Transfer Learning: In this type of transfer learning, the source and target task are the same, however, they are still different from one another. The model will use inductive biases from the source task to help improve the performance of the target task. The source task may or may not contain labeled data, further leading onto the model using multitask learning and self-taught learning.
Unsupervised Transfer Learning: I assume you know what unsupervised learning is, however, if you don’t, it is when an algorithm is subjected to being able to identify patterns in datasets that have not been labeled or classified. In this case, the source and target are similar, however, the task is different, where both data is unlabelled in both source and target. Techniques such as dimensionality reduction and clustering are well known in unsupervised learning.
Transductive Transfer Learning: In this last type of transfer learning, the source and target tasks share similarities, however, the domains are different. The source domain contains a lot of labeled data, whereas there is an absence of labeled data in the target domain, further leading onto the model using domain adaptation.
Transfer Learning vs. Fine-tuning
Fine-tuning is an optional step in transfer learning and is primarily incorporated to improve the performance of the model. The difference between Transfer learning and Fine-tuning is all in the name.
Transfer learning is built on adopting features learned from one task and “transferring” the leveraged knowledge onto a new task. Transfer learning is usually used on tasks where the dataset is too small, to train a full-scale model from scratch. Fine-tuning is built on making “fine” adjustments to a process in order to obtain the desired output to further improve performance. The parameters of a trained model during fine-tuning, are adjusted and tailored precisely and specifically, whilst trying to validate the model to achieve the desired outputs.
Why Use Transfer Learning?
Reasons to use transfer learning:
Not needing a lot of data - Gaining access to data is always a hindrance due to its lack of availability. Working with insufficient amounts of data can result in low performance. This is where transfer learning shines as the machine learning model can be built with a small training dataset, due to it being pre-trained.
Saving training time - Machine learning models are difficult to train and can take up a lot of time, leading to inefficiency. It requires a long period of time to train a deep neural network from scratch on a complex task, so using a pre-trained model saves time on building a new one.
Transfer Learning Pros
Better base: Using a pre-trained model in transfer learning offers you a better foundation and starting point, allowing you to perform some tasks without even training.
Higher learning rate: Due to the model already having been trained on a similar task beforehand, the model has a higher learning rate.
Higher accuracy rate: With a better base and higher learning rate, the model works at a higher performance, producing more accuracy outputs.
When Does Transfer Learning Not Work?
Transfer learning should be avoided when the weights trained from your source task are different from your target task. For example, if your previous network was trained for classifying cats and dogs and your new network is trying to detect shoes and socks, there is going to be a problem as the weights transferred from your source to the target task will not be able to give you the best of results. Therefore, initialising the network with pre-trained weights that correspond with similar outputs to the one you are expecting is better than using weights with no correlation.
Removing layers from a pre-trained model will cause issues with the architecture of the model. If you remove the first layers, your model will have a low learning rate as it has to juggle working with low-level features. Removing layers reduces the number of parameters that can be trained, which can result in overfitting. Being able to use the correct amount of layers is vital in reducing overfitting, however, this is also a timely process.
Transfer Learning Cons
Negative transfer learning: As I mentioned above, negative transfer learning is when a previous learning method obstructs the new task. This only occurs if the source and target are not similar enough, causing the first round of training to be too far off. Algorithms don't have to always agree with what we deem as similar, making it difficult to understand the fundamentals and standards of what type of training is sufficient.
Transfer Learning in 6 Steps
Let’s dive into a better understanding of how transfer learning is implemented and the steps taken. There are 6 general steps taken in transfer learning and we will go through each of them.
- Select Source Task: The first step is selecting a pre-trained model that holds an abundance of data, having a relationship between the input and output data with your chosen Target task.
- Create a Base Model: Instantiate a base model with pre-trained weights. Pre-trained weights can be accessed through architectures such as Xception. This is developing your source model, so that it is better than the naive model we started with, ensuring some increase in learning rate.
- Freeze Layers: To reduce initialising the weights again, freezing the layers from the pre-trained model is necessary. It will redeem the knowledge already learned and save you from training the model from scratch.
base_model.trainable = False
- Add new Trainable Layers: Adding new trainable layers on top of the frozen layer, will convert old features into predictions on a new dataset.
- Train the New Layers: The pre-trained model contains the final output layer already. The likelihood that the current output on the pre-trained model and the output you want from your model will be different, is high. Therefore, you have to train the model with a new output layer. Therefore, adding new dense layers and the final dense layer in correspondence to your expected model, will improve the learning rate and produce outputs of your desire.
- Fine-tuning: You can improve the performance of your model by fine-tuning, which is done by unfreezing all or parts of the base models and then retraining the model with a very low learning rate. It is critical to use a low learning rate at this stage, as the model you are training is much larger than it was initially in the first round, along with it being a small dataset. As a result, you are at risk of overfitting if you apply large weight updates, therefore you want to fine-tune in an incremental way. Recompile the model as you have changed the model’s behavior and then retrain the model again, monitoring any overfitting feedback.
I hope this article has given you a good introduction and understanding of Transfer Learning. Stay tuned, my next article will be me implementing Transfer Learning for Image Recognition and Natural Language Processing.
Nisha Arya is a Data Scientist and freelance Technical writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.