Speech tagging using Maximum Entropy models
Maximum entropy modeling is a framework for integrating information from many heterogeneous information sources for classification.?The data for a?classification problem is described as a large number of features.?These features can be quite complex and allow the experimenter to make use of prior knowledge about what types of informations are expected to be important for classification. Each feature corresponds to a constraint on the model.?We then compute the maximum entropy model (maxent), the model with the maximum entropy of all the models that satisfy the constraints.?
The precision and recall accuracy figures for programs using maxent models are the state of the art on tasks like part of speech tagging, sentence detection, prepositional phrase attachment, and named entity recognition.
POS tagging is a specific use case that maxent models perform by combining several heterogenous features in a probabilistic network. In all NLP applications, the features are not independent and since their independence cannot be assumed, their 'word level' interdependence is quite pronounced. This allows for the Maximum entropy classifier to be used, because the characteristics are interdependent words. In the figure below, all the features (preceding and succeeding words, preceding and succeeding POS tags) around the W4 word "vehicles" is used in the model.
MaxEnt is log-linear or exponential algorithm and is depicted as extracting features, multiplying them by respective weights, and adding them up, then using this sum as an exponent: