The True Decision Surface - Mystery Yet To Be Solved

The True Decision Surface - Mystery Yet To Be Solved

This article looks into classification from a decision surface perspective. An overview of some common supervised classification algorithms is provided. Finally, the decision surface captured by these algorithms with some food for thought is discussed.

Machine learning, largely looks into an important aspect of learning - the ability to classify(ATC). In most simplistic terms, call Apple an Apple and Orange an Orange. This ATC has seen myriad applications and is being leveraged in myriad scenarios. The paradigm of machine learning called supervised learning has a special set of algorithms called classification algorithms that focus on this ATC. For simplicity, other paradigms are not considered in the current discussion. The various supervised classification algorithms(SCA) demonstrate a descent level of this ATC. In particular, given the training data, these SCA learn the function that maps the input to the output. This could be considered as capturing the decision surface(DSRC) of the underlying problem/data - more precisely a DSRC is superimposed on the given data. Therefore, different approaches to learning the function provide different decision surfaces, over which the classification is done.

Eventually, when the test data sample is provided, the function learned maps the input in the test data sample to the output thereby predicting the corresponding class. Simplistically, a DSRC could be considered as the boundaries that separate the classes. Broadly, these boundaries could be linear or non linear. It is interesting to see how different SCA learn the function that maps the input to the output or in other words capture the DSCRC in the underlying data/problem. However, one should realize that in general, there is no explicit capturing of the decision surface, it could be thought of to be captured implicitly.

K-Nearest neighbor algorithm - https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm, leverages the distance as the criteria to decide on the class membership of the sample under consideration(SUC) - test sample. The resultant class predicted is the class of the majority of the K-nearest neighbors of the SUC. Various metrics for computing distance could be leveraged viz euclidean, city-cab, cosine etc.The Naive Bayes algorithm - https://en.wikipedia.org/wiki/Naive_Bayes_classifier, has an unique ability to fuse domain/prior knowledge and the evidence, which it refers to as prior probability and the likelihood respectively. The class membership of the SUC is decided based on the posterior probability,which is the product of prior probability and the likelihood. The class which has the maximum posterior probability is the resultant class of the SUC.

Decision tree proceeds by dividing the input space at different points where the information gain - https://en.wikipedia.org/wiki/Information_gain_in_decision_trees, is maximum - simplistically at those points the subset of data has one dominant class and this data point provides a kind of rule for classifying a SUC to that class. Various points and hence various rules are finally obtained after this exercise. Finally given a SUC, appropriate rule is applied to classify it into appropriate class. Discriminant analysis- https://en.wikipedia.org/wiki/Linear_discriminant_analysis, finds the linear combination of the inputs that lead to the output from the given data in such a way that, the distance among the samples of the same class is minimized, and the distance among the samples of different class is maximized.The linear combination learnt is applied on the SUC to decide on the resultant class. Support vector machine - https://en.wikipedia.org/wiki/Support-vector_machine, looks for the hyper plane that separates the classes with maximum margin, and given the training data it finds the equation for that hyper plane to be used for classifying a SUC.

Decision Surfaces-A

No alt text provided for this image

At this juncture it is vital to describe both the problem and the data set. The problem refers to the classification problem being solved. For the figure Decision Surfaces-A above, the problem is classification of a flower as setosa or versicolor, given its sepal length and sepal width. In figure Decision Surfaces-B below, the problem is classification of a flower as versicolor or virginica, given its sepal length and sepal width. However, the data set is the training data that is provided for learning, and this data-set is a representation of the problem.

Given a data set/problem to be learned, different SCA capture different decision surfaces over which the classification would be done. This is evident in the figures, Decision Surfaces-A above and Decision Surfaces-B below. In both the figures, the top left sub-figure is the data set over which the SCA were trained. For simplification, the data set considered hast just two features (sepal length and sepal width) and two classes - this makes possible the visualization of DSRC captured by SCA, in many real world problems the decision surfaces are too complex to visualize .

Decision Surfaces-B

No alt text provided for this image

An interesting question arises, of the various decision surfaces captured by SCA which one is a true decision surface for a given data set/problem. Towards this, it is essential to define as to what a true decision surface is. Consider the definition as : a true-decision surface is the one that doesn't misclassify any given unseen SUC, forever - implying an assured 100 % success, in fact an immortal decision surface.

Consider a hypothetical yet possible scenario, in future if due to some climatic changes the sepal length and width that were applicable to the versiocolor /versova would no longer be applicable, then in such a scenario the SCA that were once giving spectacular results would start misclassifying. Updating the training data is an obvious approach, however there are scenarios in real life problems where this has to be updated quite often and this might not guarantee the same accuracy that was obtained earlier .

Whatever SCA or combination of them one chooses that would give the best accuracy, all of this are under the framework X-----> Y, where X is the input & Y is the output. This assumes symmetrical behavior i.e X----->Y implies Y----->X. All the solutions under this framework mathematically boil down to curve fitting i.e different SCA fit different curves. The algorithm precisely does not know as to why a SUC is being classified as a particular class, it blindly resorts to the decision surface aka the curve that was captured/fit over the data to answer this. The best of the algorithms currently, Deep learning not briefed here, with various forms is pretty data hungry. In environments that are dynamic, to have a reliable ATC, constant updating of training data, implying retraining with the same algorithm or different would be required, thus repeating the same exercise - this is laborious and an immature learning indeed. Should the learning approach have a reasoning imbibed in them in order to exhibit maturity in learning, rather than a laborious exercise - https://www.quantamagazine.org/to-build-truly-intelligent-machines-teach-them-cause-and-effect-20180515/


Recently there have been tremendous technological advancements and the modern cloud systems support efficient pipelines which automate the entire flow right from data extraction to prediction. These pipelines are in a way towards tackling the changes in decision surface of a problem. Here the latest data is updated to the model and the model retrained periodically.


Given the rate at which the world is progressing, many changes are evident at various levels. One could say that domains evolve, society changes, things that once applied would no longer be applicable and as a result, the training data that was good enough for the SCA to give pretty good results could start failing. In a way, as time progresses the decision surfaces captured by SCA would lose their valor, and therefore the true decision surface is a mystery yet to be solved.



要查看或添加评论,请登录

Mohammed Mudassar M.的更多文章