The Next AI Revolution?
Stefan Huyghe
??LangOps Pioneer ? AI Enterprise Strategist ?? LinkedIn B2B Growth ?Globalization Consultant ?? Localization VP ??Content Creator ?? Social Media Evangelist ?? Podcast Host ?? LocDiscussion Brainparent
Vector Spaces: Powering Next-Gen Multilingual AI
One of the most promising avenues for advancement in multilingual artificial intelligence (AI) is happening through Vector Spaces and the application of Vector Search which can exploit the similarities between words across different languages.
In this edition, we will explore what vector spaces are, how they work, and discuss their relevance in solving Automated Translation and Large Language Model (LLM) problems. We will also highlight some promising applications of vector spaces in the localization field. So, let's dive in!
The Concept
Vector spaces were originally a mathematical concept that was used in many areas of computer science and now it is also applied in AI. Vectors can be used to represent many different things, such as documents, and even images.
Vector spaces can also be used to represent words and phrases in a way that captures their meaning. This can be helpful for multilingual AI systems, as it allows them to compare words and phrases from different languages even if they do not have a direct translation.
Although the English word "dog" and the French word "chien" are both nouns that refer to the same animal, they fall under different grammatical rules in each language. For example, in French, the word "chien" is masculine, while the word "dog" is gender-neutral in English. This means that if you are talking about a female dog in French, you would need to use the word "chienne".
Another difference between the two words is that they have different connotations. In English, the word "dog" can sometimes be used in a negative way, to refer to someone who is unruly or aggressive. The word "chien" does not have the same negative connotations in French.
The English words "dog" and its French translation "chien" while similar hold considerable differences, however, they can be represented in the same vector space, which allows a multilingual AI system to understand that they are related. This can be helpful for tasks such as machine translation, text summarization, and question answering.
Multilingual AI systems can use vector spaces to represent these words in a way that captures their meaning, even though they are not always directly translated to each other.
Operations: What you can do with it
Imagine you have a collection of different types of toys, like toy cars, toy planes, and toy animals. Each toy has its own special features and abilities. We can think of each toy as a vector, and the collection of toys represents our vector space.
In this vector space, we can perform two main operations: addition and scalar multiplication. Let's explore these operations using our toy collection:
Putting it all together: Why is this so important?
Currently, there are limitations on the amount of text that can be provided to a large language model. Furthermore, when we add more text to prompts, the response time of the language model slows down. Processing the additional information and generating an answer takes more time. Additionally, there's a challenge with keeping the model updated with new knowledge. If there is a regularly updated database, the language model cannot keep pace with those updates. Training the model on new information requires extensive GPU training hours, which are both time-consuming and expensive.
Vector databases can provide a managed way of controlling the information presented to a large language model. Unlike the internal memory of the language model, a vector database allows for the addition, deletion, and updating of records, like a normal database. This enables us to keep the information up to date, overcoming the frozen state of the world that the language model was trained on.
The most prominent AI and machine learning-powered products globally are already powered through the application of Vector Space technology. For instance, search engines like Google utilize Vector Search. Similarly, platforms like Amazon rely on it for product recommendations and searches. Even social media giants like Facebook implement personalized feed ranking using the same methodology. TikTok recently announced that they also employ a similar method for their recommendations. It is a method that has proven to be highly effective and is widely used in the most successful recommendation engines, search engines, feed ranking solutions, and similar systems.
Vector spaces have become such an essential concept in AI that they are likely to play a crucial role in solving various problems related to language and data analysis in future automated translation applications. They will improve multilingual AI performance. As AI systems become more complex, they are going to need to be able to handle more data and more tasks. Vector spaces can help AI systems do this by providing a better way to represent and compare data from different languages.
Avant Garde: Vector Technology in CAT Tools
Vector search is proving to be a versatile tool that finds applications in various fields, including the newest computer-assisted translation (CAT) tools. In CAT tools, vector spaces are employed to represent the meaning of words and phrases, as well as the connections between them. This information can enhance translation quality and facilitate other helpful features, such as suggesting alternative translations and creating glossaries.
One notable example of a CAT tool application utilizing vector spaces is NFA (Neural Fuzzy Adaptation) originally developed by Jean Senellart at SYSTRAN. NFA is a?system that’s part of the neural translation engine, specifically the?inference?model,?that employs a neural network to learn the relationships between words and phrases in different languages. The neural network is trained on a large collection of parallel texts, which?can be?texts translated into multiple languages?(such as one source, multiple targets, or the same language but for different regions).
Here is how it operates:
Once the neural network is trained, it can be used to translate new texts. It starts by generating a vector representation of the source text, capturing its meaning as a mathematical object. The neural network then employs this vector representation to search for the most probable translation in the target language.
NFA has demonstrated a promising ability to produce higher-quality translations. According to Systran, NFA was compared to other?neural?machine translation systems and proved to deliver translations that were equal to or better than those produced by other systems
Vector spaces are being used to train the newest AI-powered machine translation models and now you know why! Some of the most advanced CAT tools in the industry like?XTM?cloud, Across, and MemoQ. already have implemented NFA technology?connecting to SYSTRAN Servers for the pre-translation phase.
?
Co-Founder, CEO @Tribeshare
1 年I haven't heard of Neural Fuzzy Adaptation. Intriguing article Stefan Huyghe
Business Development Manager
1 年Interesting. I'd like to underline however that chien can have a negative connotation when speaking about a person both in French and Italian.
Content Marketing Leader
1 年Stefan Huyghe I read your article and found it quite insightful! Spreading knowledge about NFA (Neural Fuzzy Adaptation) will help companies understand how they can utilize AI with (near) peace of mind when it comes to contextuality. ??
Business Development, Strategy, & Marketing Consultant with 15+ years helping clients go global
1 年As always, some good insights shared. Thanks Stefan Huyghe.
Global Human Advocate | Reinvention & Age Diversity Champion, Professional Coach in development | ex Pearson Education | LocLunch? Ambassador
1 年Interesting findings Stefan Huyghe! The challenges are met in many different ways... here's a quote from Gabriel Fairman about how Bureau Works has a new AI feature "Translation Smells", which "uses a variety of Natural Language Processing techniques related to the tokenization and indexation of words, allied with prompt engineering we can feed LLMs the original content and the translated content as well as multiple contextual sources to sniff out potential problems with the translation..." Potential errors that can be "sniffed out" include Unnatural Speech Patterns, Inaccuracies, Potential awkward word choices, Wrong translations (Incorrect meaning), even Inappropriate or offensive content. All of this brings "less focus on error avoidance" and allows the translator to work with an assistant who is always there. I'm not sure of the comparison to Vector Spaces but the tokenization is a similar concept. Boryana Nenova what's your take on this, and Gabriel's article "Something smells fishy in this translation..." https://mergingminds.substack.com/p/something-smells-fishy-in-this-translation