What is deep learning?
Deep learning is a subset of machine learning that uses multilayered neural networks, called deep neural networks, to simulate the complex decision-making power of the human brain. Some form of deep learning powers most of the artificial intelligence (AI)?applications in our lives today.
The chief difference between deep learning and machine learning is the structure of the underlying neural network architecture. “Nondeep,” traditional machine learning models use simple neural networks with one or two computational layers. Deep learning models use three or more layers—but typically hundreds or thousands of layers—to train the models.
While supervised learning models require structured, labeled input data to make accurate outputs, deep learning models can use unsupervised learning. With unsupervised learning, deep learning models can extract the characteristics, features and relationships they need to make accurate outputs from raw, unstructured data. Additionally, these models can even evaluate and refine their outputs for increased precision.
Deep learning is an aspect of data science that drives many applications and services that improve automation, performing analytical and physical tasks without human intervention. This enables many everyday products and services—such as digital assistants, voice-enabled TV remotes, credit card fraud detection, self-driving cars and generative AI.?
How deep learning works?
Neural networks, or artificial neural networks, attempt to mimic the human brain through a combination of data inputs, weights and bias—all acting as silicon neurons. These elements work together to accurately recognize, classify and describe objects within the data.
Deep neural networks consist of multiple layers of interconnected nodes, each building on the previous layer to refine and optimize the prediction or categorization. This progression of computations through the network is called forward propagation. The input and output layers of a deep neural network are called?visible?layers. The input layer is where the deep learning model ingests the data for processing, and the output layer is where the final prediction or classification is made.
Another process called backpropagation?uses algorithms, such as?gradient descent, to calculate errors in predictions, and then adjusts the weights and biases of the function by moving backwards through the layers to train the model. Together, forward propagation and backpropagation enable a neural network to make predictions and correct for any errors . Over time, the algorithm becomes gradually more accurate. Deep learning requires a tremendous amount of computing power. High-performance?graphical processing units (GPUs)?are ideal because they can handle a large volume of calculations in multiple cores with copious memory available. Distributed cloud computing might also assist. This level of computing power is necessary to train deep algorithms through deep learning. However, managing multiple GPUs on premises can create a large demand on internal resources and be incredibly costly to scale. For software requirements, most deep learning apps are coded with one of these three learning frameworks: JAX, PyTorch or TensorFlow.
Types of deep learning models
Deep learning algorithms are incredibly complex, and there are different types of neural networks to address specific problems or datasets. Here are six. Each has its own advantages and they are presented here roughly in the order of their development, with each successive model adjusting to overcome a weakness in a previous model.
One potential weakness across them all is that deep learning models are often “black boxes,” making it difficult to understand their inner workings and posing interpretability challenges. But this can be balanced against the overall benefits of high accuracy and scalability.
CNNs
Convolutional neural networks (CNNs or ConvNets) are used primarily in computer vision and image classification applications. They can detect features and patterns within images and videos, enabling tasks such as object detection, image recognition, pattern recognition and face recognition. These networks harness principles from linear algebra, particularly matrix multiplication, to identify patterns within an image.
领英推荐
CNNs are a specific type of neural network, which is composed of node layers, containing an input layer, one or more hidden layers and an output layer. Each node connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.
At least three main types of layers make up a CNN: a convolutional layer, pooling layer and fully connected (FC) layer. For complex uses, a CNN might contain up to thousands of layers, each layer building on the previous layers. By “convolution”—working and reworking the original input—detailed patterns can be discovered. With each layer, the CNN increases in its complexity, identifying greater portions of the image. Earlier layers focus on simple features, such as colors and edges. As the image data progresses through the layers of the CNN, it starts to recognize larger elements or shapes of the object until it finally identifies the intended object.
CNNs are distinguished from other neural networks by their superior performance with image, speech or audio signal inputs. Before CNNs, manual and time-consuming feature extraction methods were used to identify objects in images. However, CNNs now provide a more scalable approach to image classification and object recognition tasks, and process high-dimensional data. And CNNs can exchange data between layers, to deliver more efficient data processing. While information might be lost in the pooling layer, this might be outweighed by the benefits of CNNs, which can help to reduce complexity, improve efficiency and limit risk of overfitting.?
There are other disadvantages to CNNs, which are computationally demanding—costing time and budget, requiring many graphical processing units (GPUs). They also require highly trained experts with cross-domain knowledge, and careful testing of configurations, hyperparameters and configurations.
RNNs
Recurrent neural networks (RNNs)?are typically used in natural language and speech recognition applications as they use sequential or time-series data. RNNs can be identified by their feedback loops. These learning algorithms are primarily used when using time-series data to make predictions about future outcomes. Use cases include stock market predictions or sales forecasting, or ordinal or temporal problems, such as language translation, natural language processing (NLP), speech recognition and image captioning. These functions are often incorporated into popular applications such as Siri, voice search and Google Translate.
RNNs use their “memory” as they take information from prior inputs to influence the current input and output. While traditional deep neural networks assume that inputs and outputs are independent of each other, the output of RNNs depends on the prior elements within the sequence. While future events would also be helpful in determining the output of a given sequence, unidirectional recurrent neural networks cannot account for these events in their predictions.
RNNs share parameters across each layer of the network and share the same weight parameter within each layer of the network, with the weights adjusted through the processes of backpropagation and gradient descent to facilitate reinforcement learning.
RNNs use a backpropagation through time (BPTT) algorithm to determine the gradients, which is slightly different from traditional backpropagation as it is specific to sequence data. The principles of BPTT are the same as traditional backpropagation, where the model trains itself by calculating errors from its output layer to its input layer. BPTT differs from the traditional approach in that BPTT sums errors at each time step, whereas feedforward networks do not need to sum errors as they do not share parameters across each layer.
An advantage over other neural network types is that RNNs use both binary data processing and memory. RNNs can plan out multiple inputs and productions so that rather than delivering only one result for a single input, RNNs can produce one-to-many, many-to-one or many-to-many outputs. There are also options within RNNs. For example, the long short-term memory (LSTM) network is superior to simple RNNs by learning and acting on longer-term dependencies.
However, RNNs tend to run into two basic problems, known as exploding gradients and vanishing gradients. These issues are defined by the size of the gradient, which is the slope of the loss function along the error curve.
Some final disadvantages: RNNs might also require long training time and be difficult to use on large datasets. Optimizing RNNs add complexity when they have many layers and parameters.