Is active learning same as semi supervised learning?
Pooja Palod
Machine Learning Engineer @Uber |Building Machine Learning Systems |IIT Bombay | AIR 51, Gate CSE 2016
Supervised learning algorithms assumes availability of labelled data. In most of the scenarios getting labeled data is expensive. Unlabeled data is easily available compared to labeled data. Semi Supervised Learning and active Learning are two approaches to learn from labelled and unlabeled data.
Semi Supervised Learning= Supervised Learning + Unsupervised Learning
The basic idea in semi supervised learning is that unlabeled examples provide information on the distribution of examples which can help in learning a more effective classifier. Let SL and SU be labeled and unlabeled sets. In semi supervised learning ,self training is done on combined(SL,SU) set. Here classifier outputs prediction score along with label. Small set of example in SU that are most ‘confidently’ classified by the current classifier, are added to SL. classifier is rebuilt from SL set again. This process is repeated until there are no instances in SU that are classified with sufficiently high confidence by the current classifier. Self training has some limitations. Once a predicted label has been added to the labeled set, that label is never reconsidered, and therefore the method cannot recover from prediction errors made by early classifiers. Expectation Maximization Algorithm is used to address some of the limitations of self-training by assigning ‘soft’ labels to unlabeled data points, and iteratively revising them.
领英推è
In active learning, algorithm prioritizes the data which needs to be labelled (from large unlabeled dataset) in order to have the highest impact to training a supervised classifier. Algorithm selects data with?least confidence output, query their label from, say, human annotators and add that to SL dataset. This process is repeated till we achieve required level of performance. Uncertainty sampling(selecting example with least confidence) and Query-by-committee (ensemble method to select example to label) are techniques widely used to pick up the examples for labeling.
To conclude, Semi supervised and Active Learning are trying to solve same problem (learn more form unlabeled data) the way in which they do is different. Active learning focus on learning from important examples from unlabeled data ( i.e. labels of some data points are more informative than others) while semi supervised learning prefer to use entire unlabeled dataset.
Program Management for Software Product Engineering - Intelligent, Web Scale, Distributed, Mobile, Real Time, Embedded
2 å¹´Very nicely explained. Both are iterative processes as you have already said in the article. Both progressively move data from unlabelled to labelled. They do so in reverse sequence. One moves the highest confidence data first, trusting that the distribution is of predictive value for getting the greatest boost in classifier performance. The other moves the lowest confidence data first, trusting that human judgement has predictive value for getting the greatest boost to classifier performance. In this sense, the two approaches are opposites of each other.
Data Scientist II at Bottomline Technologies | AI-ML, NLP | LLM/Gen AI
2 å¹´Basic visualisation will help for AL process.