The build in AI algorithms for the openNLP categorizer
maxent – maximum entropy
The?principle of maximum entropy?states that the?probability distribution?which best represents the current state of knowledge about a system is the one with largest?entropy, in the context of precisely stated prior data (such as a?proposition?that expresses?testable information).
In ordinary language, the principle of maximum entropy can be said to express a claim of epistemic modesty, or of maximum ignorance (In the US its well known as GOP-principle). The selected distribution is the one that makes the least claim to being informed beyond the stated prior data, that is to say the one that admits the most ignorance beyond the stated prior data.
See here an example of the maxent categorizer
n-gram
领英推荐
An n-gram is a collection of n successive items in a text document that may include words, numbers, symbols, and punctuation. N-gram models are useful in many text analytics applications where sequences of words are relevant, such as in sentiment analysis, text classification, and text generation. N-gram modeling is one of the many techniques used to convert text from an unstructured format to a structured format. An alternative to n-gram is word embedding techniques, such as?word2vec. (See the article of SPACY for word vectors)
See here an example of the n-gram categorizer
Naive Bayes
Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of?feature?values, where the class labels are drawn from some finite set. There is not a single?algorithm?for training such classifiers, but a family of algorithms based on a common principle: all naive Bayes classifiers assume that the value of a particular feature is?independent?of the value of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 10?cm in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of any possible?correlations?between the color, roundness, and diameter features.
See here an example of the naive Bayes categorizer