Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)

Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)

RBM: Restricted Boltzmann machines are undirected graphical models that can also be interpreted as two-layered stochastic neural networks. It is very useful in (1) Unsupervised learning, and (2) Feature extraction, etc. Restricted Boltzmann machines have received a lot of attention after being proposed as building blocks of multi-layer learning architectures called deep belief networks (DBNs, [7, 8]).

Relation with deterministic feed-forward neural networks: “The idea is that the hidden neurons extract relevant features from the observations. These features can serve as input to another RBM. By stacking RBMs in this way, one can learn features from features in the hope of arriving at a high-level representation” [10]. It is an important property, due to this the single as well as stacked RBMs can be reinterpreted as deterministic feed-forward neural networks.

The summary of some other basic characteristics of RBM:

1.    Architectural Features: RBM contains one layer of hidden units and one layer of visible units. There exist no connection between hidden units nor between visible units (i.e. a restriction applied to the Boltzmann Machine). In this representation, edges can be undirected or bi-directed. It actually forms a symmetric bi-partite graph.

2.    RBMs as generative models: When an RBM is used as a generative model, it is used for drawing samples from the learned distribution.

3.    The Training of RBM: Contrastive divergence (CD) algorithm is used to train the RBM [9]. The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.

Video links for more information about architecture and training of RBMs



Basic facts behind the usefulness of RBM in Feature Extraction, Classification etc.: Actually, RBM comprises of two types of variables: (1) a layer of visible variables which correspond to the components of the inputs ‘visible layer’, and (2) a layer of hidden (or latent) variables which capture dependencies between the visible neurons i.e., ‘hidden layer’. After training, the expected states of the hidden variables given an input can be interpreted as the (learned) features extracted from this input pattern. The dimensionality of the learned features depends upon the number of hidden units. We use these extracted features and relation between variables of both layers in most of the applications related to RBM. For example, lets us consider on few applications given below. [See Ref – 1 -10]

Classification: RBMs can be useful in the classification of the image, XML data, and text etc. If labeled training data is given, and the RBM is trained on the joint distribution of inputs and labels, Then we can have any of the two possibilities: (1) We can sample the missing label for a represented data from the distribution or (2) we can assign a new data to the class with the highest probability under the model [4].

Imbalanced data problem. In imbalance data problem, one class dominates another. To solve this issue we generally generate artificial examples for the dominated class, using Synthetic Oversampling Technique. For each of the newly created example, we apply Gibbs sampling. Finally, we label newly created example and store in training data [5, 6].

Noisy labels problem. This is one of the most common problem in classification. In this case, some of the examples in training data contain incorrectly assigned labels. So to correct those labels, RBM is trained for each of the classes separately. Each of the trained models is used as an oracle to detect uncorrected labelled data. Finally, Reconstruction error is used to determine unlabeled examples [5, 6].

Unstructured data. The data is represented in unprocessed form: images, videos, documents, XML structures. In such cases, RBM is used as domain-independent feature extractor that transforms raw data into hidden units.

RBM Vs Autoencoders.

Which is better RBM or autoencoders and why? Read below Yoshua’s answer to the roughly same question.

Does Yoshua Bengio prefer to use Restricted Boltzmann Machines or (denoising) Autoencoders as building blocks for deep networks? And why?

Reference:

  1. David H Ackley, Georey E Hinton, and Terrence J Sejnowski. A learning algorithm for boltzmann machines. Cognitive science, 9(1):147{169, 1985.
  2. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671{680, 1983.
  3. G Hinton and S Osindero. A fast learning algorithm for deep belief nets. Neural computation, 2006.
  4. Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto."
  5. Maciej Zieba, Jakub M. Tomczak, Adam Gonczarek: RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique. ACIIDS (1) 2015: 377-386.
  6. Jakub M. Tomczak, Maciej Zieba: Classification Restricted Boltzmann Machine for comprehensible credit scoring model. Expert Syst. Appl. 42(4): 1789-1796 (2015.
  7. Hinton, G.E.: Learning multiple layers of representation. Trends in Cognitive Sciences 11(10), 428–434 (2007)
  8. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
  9. Hinton, G. E. (2002). "Training Products of Experts by Minimizing Contrastive Divergence" (PDF). Neural Computation. 14 (8): 1771–1800.
  10. Fischer, Asja, and Christian Igel. "An introduction to restricted Boltzmann machines." Iberoamerican Congress on Pattern Recognition. Springer, Berlin, Heidelberg, 2012.
Suheel YOUSUF Wani

Building Scalable AI-Driven Reproducible Bioinformatics Workflows | IIITB

6 年

informative

回复

要查看或添加评论,请登录

Niraj Kumar, Ph.D.的更多文章

  • Internal Covariate Shift and Batch Normalization

    Internal Covariate Shift and Batch Normalization

    Internal Covariate Shift Internal covariate shift [1,2,3] refers to the phenomenon where the distribution of inputs to…

  • Forced/Guided Learning in Deep Learning

    Forced/Guided Learning in Deep Learning

    The forced/guided type deep learning techniques have proven their ability in any model that outputs in sequences. For…

  • Deep Clustering (A Self-Supervised Learning System)

    Deep Clustering (A Self-Supervised Learning System)

    If you are interested in any of the following, How do I develop a deep learning model, that can learn to do clustering?…

  • Time to Welcome - “The Quantum Deep Learning”

    Time to Welcome - “The Quantum Deep Learning”

    The Quantum World is Approaching Us The MIT xPRO - Quantum Computer Ai, highlighted the status of quantum AI by using…

  • Deep Learning for Dynamic Graph

    Deep Learning for Dynamic Graph

    Introduction. It is well understood that adding the time dimension to each and every component of the graph helps us in…

  • Winning Ensemble Classification Strategies

    Winning Ensemble Classification Strategies

    These days (1) due to the increase in the complexity of data, (2) data quality-related issues, and (2) the demand for…

  • Simplest Tutorials on BERT and XLNet

    Simplest Tutorials on BERT and XLNet

    XLNet XLNet: is a generalized autoregressive pre-training method that (1) enables learning bidirectional contexts by…

  • Video Book on Deep Learning

    Video Book on Deep Learning

    I am happy to present a video book on deep learning. Thanks for all the email messages and suggestions.

    3 条评论
  • Deep Learning for NLP Part-2

    Deep Learning for NLP Part-2

    Sequence transduction plays a very important role in natural language processing. The ability to transform and…

  • Loss Functions: Cross-Entropy, Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss

    Loss Functions: Cross-Entropy, Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss

    The following contains tutorial videos on (1) Cross-Entropy, (2) Categorical Cross-Entropy Loss, and (3) Binary…

社区洞察

其他会员也浏览了