New Architectures are Driving Progress in Natural Language Processing

New Architectures are Driving Progress in Natural Language Processing

No alt text provided for this image

J. R. Firth, the famous English linguist, ?is known for this succinct quote describing one of the fundamental properties of language: context-dependence. His point is the literal meaning of individual words cannot be interpreted in isolation and are highly dependent on the other words ?in a given sentence. Context-dependence concepts dictate the complexity, nuance, and richness in languages. It also creates some of the most difficult problems in natural language processing. The following two sentences illustrate the importance of context-dependence in communicating accurately: ?

a) I swam across the river to get to the other bank

b) I drove across the street to the bank.

It is obvious that the word “bank” in the above two examples has different meanings: in sentence a) the word “bank”, refers to the land alongside of a river and is remotely related neighboring words “swam and “river, but less dependent on words in closer proximity such as “other”. ?In sentence b) ?the word “bank refers to the financial institution and is associated with the preceding words “drove” and “street”. ??

These examples illustrate two of the most difficult problems in natural language processing: 1) The influence of neighboring words and, 2) Long-term memorization of neighboring words in both directions (before and after) a particular word. In practice the number of words in a sentence varies widely but taken on average sentences generally contain 20-25 words. As a result, building a context-aware mechanism for correctly deciphering the meanings and nuances of words in a sentence requires long-term memorization of the neighboring words, which as mentioned poses significant technical challenges.

Extensive research has been conducted on context-awareness. This includes recurrent neural networks (RNNs) which are a well-known neural network architecture for natural language processing. However RNNs suffer from two disadvantages: 1) Lack of long-term memorization which impairs language processing effectiveness; 2) Sequential processing that precludes parallel computation in model training.?

In 2017, after a brief hiatus, Vaswani, etc. [1] introduced a novel neural network architecture called “Transformer” as shown in Figure 1.?This architecture is revolutionizing natural language processing. The three key innovative aspects of the Transformer architecture are:

  1. Positional encoding and attention mechanisms, which differentially weigh the significance of each word based on its surrounding words, regardless of distance, which enhances context-awareness and long-term memorization,
  2. Encoder-decoder mechanism that enables substantial improvement in next-word prediction,
  3. Adoption of a parallel-friendly feedforward neural network architecture that leads to a substantial reduction in training time.?

Transformer architecture has seen great success in a variety of machine learning language tasks including machine translation, question-answering, text-generation, chatbots. This architecture outperformed many previously reported language models on industrial benchmarks. As a result, the once dominant RNN architecture in natural language process is beginning to gradually to give way to the Transformer architecture.

No alt text provided for this image

Figure 1 High-level Transformer Neural Network Architecture

Transformer architecture has proved invaluable not only in natural language processing but also in the field of computer vision. Alexey, etc. [2] developed a Transformer-based vision network called ViT. This network splits an image into 16x16 patches and generates a sequence of linear image patch embeddings as an input to a Transformer. Image patches are treated as the same way as word tokens in natural language processing as shown in Figure 2.

When pre-trained on a large dataset, ViT approaches or beats the performance of state of the art convolutional neural networks (CNN) on multiple image recognition benchmarks. Similar to RNN in natural language processing the CNN architecture in computer vision will be gradually replaced by Transformer architecture. As unlikely as it may seem these two fields – natural language processing, and computer vision-- have a fascinating relationship and certainly seem to underscore the prophetic quote by J. R. Firth.

No alt text provided for this image

Figure 2: Image Patch and Position Embedding of Transformer?

TCS Digital Software & Solutions (DS&S) is the team behind TCS Customer Intelligence & Insights? and TCS Intelligent Urban Exchange? AI-driven CX and sustainability analytics software. Its customers are increasingly finding value in both computer vision and language modeling. Transformers increase this value and open new possibilities for enterprises.??The R&D team is further pushing cutting-edge developments in Transformer technology to supply vision and language modeling offerings that will anticipate, meet, and exceed customer needs. The team is always interested to hear how its verticalized analytics solutions in retail, finance, sustainability, smart cities, and more, can meet your needs better. Schedule a meeting with our product teams and help drive the roadmap.

References:

1.?????Vaswani,?N. Shazeer,?N. Parmar,?J. Uszkoreit,?L. Jones,?A. N. Gomez,?L. Kaiser,?I. Plosukhin, Attention is all you need, Proceedings of the 31th International Conference on Neural Information Processing Systems, December, 2017, pp. 4768—4777

2.?????Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE, the 9th International Conference on Learning Representation, 2021

Learn More:

Visit the https://www.tcs.com/what-we-do/products-platforms/tcs-intelligent-urban-exchange?page on?https://www.tcs.com

Email Us:?[email protected]

About the Authors:

?Dr. Arup Acharya, PhD

Arup is the Head of Research & Innovation at TCS Digital Software and Solutions and leads both?Architecture & Design and Research teams in DS&S.??He received his PhD from Rutgers University and B.Tech from IIT, Kharagpur in Computer Science. He has 40+ patents issued and well-published in conferences & journals in?leading-edge technology topics.?Prior to TCS, Arup worked at IBM Research and NEC Research.

No alt text provided for this image

Dr. Yibei Ling, PhD

Yibei Ling is a Senior Data Scientist at TCS Digital Software and Solutions and works on energy-aware, no-code AutoML frameworks, machine learning models for sentiment analysis, face recognition, and time-series analysis.?Prior to working at TCS Yibei was with the research labs in Bellcore (Telcordia) working on DARPA projects including sensor and distributed networks. He has published more than 30 papers in IEEE and ACM Transactions covering fault-tolerant?and distributed computing, and network security. Yibei has been granted 21 US patents and is a senior IEEE member and reviewer for Mathematical Reviews and IEEE Transactions publications.

No alt text provided for this image

Dr. Guillermo Rangel, PhD

Guillermo is a Senior Data Scientist at TCS Digital Software and Solutions with expertise in text analytics built over a decade of working on natural language modeling (NLP/G/U). Guillermo has previously worked in verticals like banking, retail, Telco and gaming serving as a solution advisor and data science consultant for companies like Bloomberg LP, Vodafone, Blizzard Entertainment, and The Home Depot amongst others. Guillermo holds a PhD in Physics from the University of California at Davis.?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了