登录查看更多内容

AI Atlas #25: Long Short-Term Memory Networks

Rudina Seseri

Venture Capital | Technology | Board Director

发布日期: 2023年9月14日

??? What are Long Short-Term Memory (LSTM) networks?

Long Short-Term Memory (LSTM) networks are specialized types of recurrent neural networks (RNN) designed to overcome certain limitations commonly found in traditional RNNs. The introduction of LSTMs has proven invaluable for machine learning and the architecture now forms the backbone of many transformative technologies, from video recognition and industrial monitoring to digital assistants such as Apple’s Siri and Amazon Alexa.

As I discussed in an earlier AI Atlas , RNN architectures excel in handling sequences and are particularly valuable in tasks involving time series data, such as speech recognition, handwriting analysis, and machine translation. They do this by preserving information as a form of “memory” within their internal structure. This is as if the model is leaving breadcrumbs along a path to assist in navigation, indicating to the model where it has gone before and the directions it has taken to get there. However, RNNs’ memories are short and these breadcrumbs are swept away, making it difficult to stay on course the longer the path goes on. This phenomenon is known as the “vanishing gradient problem” and it represents a major obstacle to the adoption of machine learning.

This is where LSTMs step in to mitigate the issue. By incorporating memory cells and strategically positioned gates that sift out irrelevant inputs, LSTMs mimic more closely the recall ability of human brains. Returning to the breadcrumb example, this improved memory allows the model to identify important path markers and prevent them from fading. It is thus able to follow much longer trails. The name “Long Short-Term Memory” refers to this transformative enhancement of prioritizing key contextual information and retaining it for an extended period of time.

?? Why LSTM networks matter and their shortcomings

LSTMs have shown success in diverse applications and have outperformed conventional RNNs in situations where complexity is high, such as when processing paragraphs or summarizing business data. They excel in dividing problems into smaller components and conquering those components individually. Furthermore, LSTMs overcome two major hurdles faced by traditional RNNs: the problem of vanishing gradients, where models lose the breadcrumbs used to mark trails; and the problem of exploding gradients, in which models spread far too many breadcrumbs and become unable to follow and learn from new routes.

领英推荐

?? A New Neural Architecture (Again)

Pascal Biese 2 个月前

The Rise and Fall of RNNs: Why Memory is Best Left to…

Shameem Ansari 1 个月前

Decoding Transformers on Edge Devices

Axelera AI 1 年前

However, while LSTMs enhance the benefits of RNNs and address major obstacles in remembering long-term information, they still suffer from many of the same shortcomings as their less specialized counterparts. Such limitations include:

Computational Intensiveness: The more complicated architecture of LSTMs results in longer processing times and necessitates greater computational resources. Their attention to long-term context is not always necessary or worth the resource tradeoff; for example, when working with shorter inputs such as tweets.
Challenging Training: Because LSTMs process data step-by-step, it is extremely difficult to make use of parallel processing during training. This is a major challenge when working with large datasets, such as when summarizing vast amounts of text.
Processing of Non-Sequential Data: LSTMs are not the best choice for all types of data. For example, their strengths will not be leveraged effectively when working with highly nonlinear data such as still images or customer classifications.

??? Applications of Long Short-Term Memory networks

Just like traditional RNNs, LSTMs excel at processing sequential data, such as stock market behavior or language models. The longer memory of LSTMs is also particularly useful in areas such as:

Speech Recognition: LSTMs are vital for accurately transcribing spoken language, which often takes the form of long sequences. For this reason LSTMs have been extensively used in major speech recognition systems, such as Apple’s Siri, Microsoft’s Cortana, and Amazon’s Alexa.
Industrial Internet-of-Things (IoT): The uncanny ability of LSTMs to filter out irrelevant data and process critical information over long periods of time is invaluable when applied to the massive amount of data generated by industrial machines, and can be used in use cases from maintenance prediction to anomaly detection and energy cost analysis.
Video Analysis and Computer Vision: Because videos are just time series of images, in which each frame depends on the previous, LSTMs find application here. They are often combined with Convolutional Neural Networks (CNNs) , which recognize the shapes onscreen while LSTMs handle the temporal aspect, to recognize activities and track behavior.

In essence, Long Short-Term Memory networks represent a sophisticated advancement in neural network architecture, addressing challenges related to preserving context over time, thereby finding applications in a wide array of fields involving time series analysis and sequence prediction.

Rudina's AI Atlas

4,929 位关注者

Taylor Chartier

CEO at Modicus Prime | AI for Drug Production and Patient Health

1 年

Definitely, LSTM-CNNs are especially useful for human activity recognition - found them helpful during my work with Parkinson's patients

1 次回应

Woodley B. Preucil, CFA

Senior Managing Director

1 年

Rudina Seseri Very informative.?Thanks for sharing.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

AI Atlas #25: Long Short-Term Memory Networks

Rudina Seseri

Venture Capital | Technology | Board Director

??? What are Long Short-Term Memory (LSTM) networks?

?? Why LSTM networks matter and their shortcomings

领英推荐

??? Applications of Long Short-Term Memory networks

Rudina's AI Atlas

4,929 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Harnessing Sight: Understanding Computer Vision in CNI Industries

Do machines Dream?

RNN’s are Schmidhuber’s Revenge

Value Creation: Delightful, Distinguished & Deeply Reinforced with Neural Networks

Seeing Beyond the Obvious: How AI's Pattern Recognition is Changing the World

What is the future of artificial intelligence?

FOD#32: Mixture of Experts – What is it?

The Rise of Vision Transformers: Taking Vaswani's 'Attention' Concepts from text to images.

Object Tracking in Computer Vision: An In-Depth Exploration and Practical Guide

??? What are Long Short-Term Memory (LSTM) networks?

?? Why LSTM networks matter and their shortcomings

领英推荐

??? Applications of Long Short-Term Memory networks

Rudina's AI Atlas

4,929 位关注者

How LoRA Streamlines AI Fine-Tuning

2024年11月14日

What is an AI Agent, Really?

2024年10月31日

Mapping the Data World with GraphRAG

2024年10月17日

Using Comgra to Visualize AI

2024年10月3日

Crafting Humanlike Interactions with NaturalSpeech-3

2024年9月19日

SAMBA - A New Chapter for State Space Models

2024年9月5日

Medusa: An AI Technique for Parallel Intelligence

2024年8月22日

How Meta’s New Model Takes Visual Intelligence Beyond the Surface

2024年8月8日

A New Approach to Tokenization

2024年7月25日

Variational Autoencoders and AI Creativity

2024年7月12日

社区洞察

其他会员也浏览了

Harnessing Sight: Understanding Computer Vision in CNI Industries

Do machines Dream?

RNN’s are Schmidhuber’s Revenge

Value Creation: Delightful, Distinguished & Deeply Reinforced with Neural Networks

Seeing Beyond the Obvious: How AI's Pattern Recognition is Changing the World

What is the future of artificial intelligence?

FOD#32: Mixture of Experts – What is it?

The Rise of Vision Transformers: Taking Vaswani's 'Attention' Concepts from text to images.

Object Tracking in Computer Vision: An In-Depth Exploration and Practical Guide