Transfer Learning - Makes the Machine Learning Models Works Even with Insufficient* Labelled Data
1. Introduction
Let us start the story with a data science project that predicts users credit scoring using the telco data at country A. It is a successful machine learning (ML) project as we have sufficient large and comprehensive labelled data.
Then, the business team expects the data science team to duplicate the same model for a new market at country B quickly. However, we can't as the currency and consumers behaviour of country B are different from country A.
I believe the story is common in data science companies. Generally, the ML model is built on the assumption that the training and test data are extracted from same feature space and same distribution. In other words, once the distribution shifts, the model fails.
Once the distribution shifts, the model fails.
Researchers have a long thought of this problem with the solution called transfer learning. In a layman term, we have labelled data from the source domain and we would like to build a ML model for the target domain of different tasks or distribution than the source domain (Pan and Yang, 2010).
In this article, we will experiment on a transfer learning method that proposed by Hal Daume III (2006), named easy adaptation (this name is coined in his later paper). In the followings, we will briefly explain easy adaptation in Section 2 and the experiment in Section 3. Finally, the conclusion is drawn in Section 4.
2. Transfer Learning with Easy Adaptation
Easy adaptation has a simple construction method in Daumé III paper of title "Frustratingly Easy Domain Adaptation". Say that we have labelled data of similar feature space (attributes of x0 and x1 with output y) in both source and target domains but in different distribution (refer to Figure 1). For instance, the second record from source domain data is (x0=2, x1=20, y=2) but the output becomes y=1 in the first record of target domain data.
Figure 1: Easy adaptation on purported tables from source and target domains.
Easy adaptation combines the source and target domains by constructing three versions for the same attribute i.e., general, source specific and target specific. They are labelled as g_*, s_* and t_* as shown in the augmented table of Figure 1. We hide the detail and the interested readers may refer to the original paper.
The augmented table will be used to train the ML model. No modification on the ML model is required. Next, we will experiment the easy adaptation with artificial samples from scikit-learn.
3. Experiments
There are a few baseline models for comparison in Hal Daume III (2006) and we took source-only and target-only models to compare with easy adaptation.
In source-only model, the ML model is built only from the source domain and it applies the prediction on the target domain directly. This approach highlights the impact of the domain shift. On the other hand, the target-only model builds the ML model only based on the target-domain samples. Since the target domain labelled data are limited, the results is normally discouraging than easy adaptation.
For the ease of comparison, we use scikit-learn SVM classifier for all models with default parameters. We have published the source code of the experiment at github (https://github.com/zkchong/easy-adaptation) for the interested readers.
Figure 3: The purported source and target domains data sets.
In the experiment, we created a purported sample data of three attributes (x0, x1 and x2). We then isolate 20% of them as the target domain samples and shift them with Gaussian distribution as shown in Figure 2 and the experiment result is shown on Table 3. Note that the target domain samples are relatively lesser than source domain's and in shifted distribution.
Table 3: The experiment results among source-only, target-only and easy adaptation.
Generally, the source-only model has a good training score on its own training data (as they are from source domain) and it scored badly (i.e. 0.2500) in the target domain. The target-only model is trained on target domain and it has a relative higher test score (i.e. 0.5357) in testing data set. Finally, the easy adaptation is trained on the augmented data set (see Figure 1) and have the highest test score (i.e. 0.6071) in the target domain data set.
Linking to the credit scoring story above, we will achieve a bad performance if using country A's ML model to predict the credit scores of the customers at country B. However, we will achieve relatively better result with transfer learning.
4. Conclusion
Transfer learning is an important topic with many proposed techniques. In this article, we have introduced the easy adaptation from Daumé III, H. (2009) with the source code published in github.
5. References
Daumé III, H. (2009). Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815.
Daumé III, H., Kumar, A., & Saha, A. (2010). Frustratingly easy semi-supervised domain adaptation. In Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing (pp. 53-59). Association for Computational Linguistics.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359.