Is active learning same as semi supervised learning?

Supervised learning algorithms assumes availability of labelled data. In most of the scenarios getting labeled data is expensive. Unlabeled data is easily available compared to labeled data. Semi Supervised Learning and active Learning are two approaches to learn from labelled and unlabeled data.

Semi Supervised Learning= Supervised Learning + Unsupervised Learning

The basic idea in semi supervised learning is that unlabeled examples provide information on the distribution of examples which can help in learning a more effective classifier. Let SL and SU be labeled and unlabeled sets. In semi supervised learning ,self training is done on combined(SL,SU) set. Here classifier outputs prediction score along with label. Small set of example in SU that are most ‘confidently’ classified by the current classifier, are added to SL. classifier is rebuilt from SL set again. This process is repeated until there are no instances in SU that are classified with sufficiently high confidence by the current classifier. Self training has some limitations. Once a predicted label has been added to the labeled set, that label is never reconsidered, and therefore the method cannot recover from prediction errors made by early classifiers. Expectation Maximization Algorithm is used to address some of the limitations of self-training by assigning ‘soft’ labels to unlabeled data points, and iteratively revising them.

In active learning, algorithm prioritizes the data which needs to be labelled (from large unlabeled dataset) in order to have the highest impact to training a supervised classifier. Algorithm selects data with?least confidence output, query their label from, say, human annotators and add that to SL dataset. This process is repeated till we achieve required level of performance. Uncertainty sampling(selecting example with least confidence) and Query-by-committee (ensemble method to select example to label) are techniques widely used to pick up the examples for labeling.

No alt text provided for this image
Active Learning

To conclude, Semi supervised and Active Learning are trying to solve same problem (learn more form unlabeled data) the way in which they do is different. Active learning focus on learning from important examples from unlabeled data ( i.e. labels of some data points are more informative than others) while semi supervised learning prefer to use entire unlabeled dataset.

Sanjeev Khadilkar

Program Management for Software Product Engineering - Intelligent, Web Scale, Distributed, Mobile, Real Time, Embedded

2 å¹´

Very nicely explained. Both are iterative processes as you have already said in the article. Both progressively move data from unlabelled to labelled. They do so in reverse sequence. One moves the highest confidence data first, trusting that the distribution is of predictive value for getting the greatest boost in classifier performance. The other moves the lowest confidence data first, trusting that human judgement has predictive value for getting the greatest boost to classifier performance. In this sense, the two approaches are opposites of each other.

Vitthal Shelke

Data Scientist II at Bottomline Technologies | AI-ML, NLP | LLM/Gen AI

2 å¹´

Basic visualisation will help for AL process.

赞
回复

要查看或添加评论,请登录

Pooja Palod的更多文章

  • Bias Variance Tradeoff

    Bias Variance Tradeoff

    Bias and variance trade off is one of the must-know concepts for every data scientist. In this article we will talk…

  • Artificial Intelligence Vs Machine Learning Vs Deep Learning Vs Data Science

    Artificial Intelligence Vs Machine Learning Vs Deep Learning Vs Data Science

    To most people, the terms Artificial Intelligence, Machine Learning , Deep Learning , Data Science seem like…

    3 条评论

社区洞察

其他会员也浏览了