Generative AI as a Foundational Model

Generative AI as a Foundational Model

What is the nature of generative AI, its basics and building blocks, and how it could become real intelligent?

All generative AI models belong to special type of statistical classifiers, a generative classifier, while a classifier based on a discriminative model and the rest is a discriminative classifier.

Trained to do a wider range of tasks (such as text synthesis, image manipulation and audio generation), it is commercialized as a general-purpose AI or GPAI system, where GPT-x series included, while being a statistical classification tool, a generative classifier.

Statistical classification together with clustering or regression is a type pattern recognition, "the automated recognition of patterns and regularities in data". Pattern recognition systems are trained from labeled or unlabeled "training" data, thus differing from the pattern-matching algorithms of web search engines. They are generally categorized according to the type of learning procedure to generate the output value, as supervised or unsupervised, semi-supervised or self-supervised, depending on the types of training data sets D of input instances x < X and output labels y > Y, D = {x, y}, being either labeled or unlabeled or mixed. All is to minimize an expected loss/cost/error function of an incorrect label or maximize a reward/utility/profit/fitness function.

Its classifiers assign each input value to one of a given set of classes, identifying a set of categories (sub-populations) to which an observation(s) belongs, analyzed as feature vectors in a multidimensional probability space, quantifiable properties or variable quantities, explainable variables (independent variables, regressors, etc.) The properties/variables/data are ordered as categorical, ordinal, integer-valued or real-valued variables.

A classification algorithm that implements the mathematic. AI function that maps input data to a category is a classifier working by comparing observations to previous observations by means of a similarity measure/function/metric or a metric/distance function. It functions by making data-driven inferences or learning, decisions or predictions, building mathematical models from input data.

In general, there are two main approaches, as the generative approach and the discriminative approach, differing in the degree of statistical modelling.

A generative model, a statistical model of the joint probability distribution, P(X, Y) on given observable variable X and target variable Y,

A discriminative model is a model of the conditional probability P(Y/X) of the target Y, given an observation X=x.

Or,

a generative model is a model of the conditional probability of the observable X, given a target y, symbolically, P(X/Y=y)

a discriminative model is a model of the conditional probability of the target Y, given an observation x, symbolically, P(Y/X=x)

In probabilistic settings, x and y are samples of random variables X and Y, Y is the variable being predicted, with ?(x) being the predicted values.

In such cases, where a single value of x can correspond to several values of y, the best choice for ?(x) (in order to minimize the mean squared error) is the conditional expectation E[Y|X=x].

This means that if you train a very expressive neural network to predict y given x (with a sufficiently big dataset), then your network would converge to E[Y|X=x].

Similarly, the best choice for x?(y) is E[X|Y=y] — if you train your very expressive network to predict x given y, then it converges to E[X|Y=y].

Hence, the question of how ?(x) relates to x?(y) in probabilistic settings can be rephrased as how the conditional expectations E[Y|X=x] and E[X|Y=y] relate to each other.

The nature of probabilistic relationships are rather counter-intuitive.

First, if y can be estimated as a linear function of x does not imply that x can also be estimated as a linear function of y.

Or, the probabilistic linear relationship ?(x) = αx does not necessarily have a linear inverse of the form x?(y) = βy due to a ‘noise’ or ‘error term’, Z, an additional random variable: Y = aX + Z.

Generalizing, if y can be estimated as a linear or nonlinear function of causal variable x that implies that x can also be estimated as a linear or nonlinear function of causal variable y.

This is about a fundamental input-output data interaction rule which is formally reflected by Bayes' theorem (alternatively Bayes' law or Bayes' rule):

P(X/Y)P(Y) = P(Y/X)P(X),

where X and Y are originally interacting events, interconnecting the posterior conditional and marginal probabilities with the prior and conditional likelihood probabilities and interpreted as input variables probability distribution X and outputs Y. Due to its interactive reversibility, one can estimate a generative model given the discriminative model, and vice versa.

Through the combination of generative models and deep neural networks, one deep generative models (DGMs) having hundreds of millions or billions of parameters, becoming very large deep generative models, as Large Language Models generative AI applications.

So, all generative AI models are essentially statistical classifiers marked by Infinite Data, Infinite Neural Networks, and Infinite Compute Power, with no scientific world models, or real learning and iIntelligence.

#GenerativeAI #AIFoundationModels #ModelSafety #AI

KRISHNAN N NARAYANAN

Sales Associate at American Airlines

1 年

Thanks for sharing

要查看或添加评论,请登录

Pinaki Laskar的更多文章

社区洞察

其他会员也浏览了