Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)
Niraj Kumar, Ph.D.
AI/ML R&D Leader | Driving Innovation in Generative AI, LLMs & Explainable AI | Strategic Visionary & Patent Innovator | Bridging AI Research with Business Impact
RBM: Restricted Boltzmann machines are undirected graphical models that can also be interpreted as two-layered stochastic neural networks. It is very useful in (1) Unsupervised learning, and (2) Feature extraction, etc. Restricted Boltzmann machines have received a lot of attention after being proposed as building blocks of multi-layer learning architectures called deep belief networks (DBNs, [7, 8]).
Relation with deterministic feed-forward neural networks: “The idea is that the hidden neurons extract relevant features from the observations. These features can serve as input to another RBM. By stacking RBMs in this way, one can learn features from features in the hope of arriving at a high-level representation” [10]. It is an important property, due to this the single as well as stacked RBMs can be reinterpreted as deterministic feed-forward neural networks.
The summary of some other basic characteristics of RBM:
1. Architectural Features: RBM contains one layer of hidden units and one layer of visible units. There exist no connection between hidden units nor between visible units (i.e. a restriction applied to the Boltzmann Machine). In this representation, edges can be undirected or bi-directed. It actually forms a symmetric bi-partite graph.
2. RBMs as generative models: When an RBM is used as a generative model, it is used for drawing samples from the learned distribution.
3. The Training of RBM: Contrastive divergence (CD) algorithm is used to train the RBM [9]. The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.
Video links for more information about architecture and training of RBMs
Basic facts behind the usefulness of RBM in Feature Extraction, Classification etc.: Actually, RBM comprises of two types of variables: (1) a layer of visible variables which correspond to the components of the inputs ‘visible layer’, and (2) a layer of hidden (or latent) variables which capture dependencies between the visible neurons i.e., ‘hidden layer’. After training, the expected states of the hidden variables given an input can be interpreted as the (learned) features extracted from this input pattern. The dimensionality of the learned features depends upon the number of hidden units. We use these extracted features and relation between variables of both layers in most of the applications related to RBM. For example, lets us consider on few applications given below. [See Ref – 1 -10]
Classification: RBMs can be useful in the classification of the image, XML data, and text etc. If labeled training data is given, and the RBM is trained on the joint distribution of inputs and labels, Then we can have any of the two possibilities: (1) We can sample the missing label for a represented data from the distribution or (2) we can assign a new data to the class with the highest probability under the model [4].
Imbalanced data problem. In imbalance data problem, one class dominates another. To solve this issue we generally generate artificial examples for the dominated class, using Synthetic Oversampling Technique. For each of the newly created example, we apply Gibbs sampling. Finally, we label newly created example and store in training data [5, 6].
Noisy labels problem. This is one of the most common problem in classification. In this case, some of the examples in training data contain incorrectly assigned labels. So to correct those labels, RBM is trained for each of the classes separately. Each of the trained models is used as an oracle to detect uncorrected labelled data. Finally, Reconstruction error is used to determine unlabeled examples [5, 6].
Unstructured data. The data is represented in unprocessed form: images, videos, documents, XML structures. In such cases, RBM is used as domain-independent feature extractor that transforms raw data into hidden units.
RBM Vs Autoencoders.
Which is better RBM or autoencoders and why? Read below Yoshua’s answer to the roughly same question.
Reference:
- David H Ackley, Georey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann machines. Cognitive science, 9(1):147{169, 1985.
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671{680, 1983.
- G Hinton and S Osindero. A fast learning algorithm for deep belief nets. Neural computation, 2006.
- Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto."
- Maciej Zieba, Jakub M. Tomczak, Adam Gonczarek: RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique. ACIIDS (1) 2015: 377-386.
- Jakub M. Tomczak, Maciej Zieba: Classification Restricted Boltzmann Machine for comprehensible credit scoring model. Expert Syst. Appl. 42(4): 1789-1796 (2015.
- Hinton, G.E.: Learning multiple layers of representation. Trends in Cognitive Sciences 11(10), 428–434 (2007)
- Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
- Hinton, G. E. (2002). "Training Products of Experts by Minimizing Contrastive Divergence" (PDF). Neural Computation. 14 (8): 1771–1800.
- Fischer, Asja, and Christian Igel. "An introduction to restricted Boltzmann machines." Iberoamerican Congress on Pattern Recognition. Springer, Berlin, Heidelberg, 2012.
Building Scalable AI-Driven Reproducible Bioinformatics Workflows | IIITB
6 年informative