The Next AI Revolution?

The Next AI Revolution?

Vector Spaces: Powering Next-Gen Multilingual AI

One of the most promising avenues for advancement in multilingual artificial intelligence (AI) is happening through Vector Spaces and the application of Vector Search which can exploit the similarities between words across different languages.

In this edition, we will explore what vector spaces are, how they work, and discuss their relevance in solving Automated Translation and Large Language Model (LLM) problems. We will also highlight some promising applications of vector spaces in the localization field. So, let's dive in!

The Concept

Vector spaces were originally a mathematical concept that was used in many areas of computer science and now it is also applied in AI. Vectors can be used to represent many different things, such as documents, and even images.

Vector spaces can also be used to represent words and phrases in a way that captures their meaning. This can be helpful for multilingual AI systems, as it allows them to compare words and phrases from different languages even if they do not have a direct translation.

Although the English word "dog" and the French word "chien" are both nouns that refer to the same animal, they fall under different grammatical rules in each language. For example, in French, the word "chien" is masculine, while the word "dog" is gender-neutral in English. This means that if you are talking about a female dog in French, you would need to use the word "chienne".

Another difference between the two words is that they have different connotations. In English, the word "dog" can sometimes be used in a negative way, to refer to someone who is unruly or aggressive. The word "chien" does not have the same negative connotations in French.

The English words "dog" and its French translation "chien" while similar hold considerable differences, however, they can be represented in the same vector space, which allows a multilingual AI system to understand that they are related. This can be helpful for tasks such as machine translation, text summarization, and question answering.

Multilingual AI systems can use vector spaces to represent these words in a way that captures their meaning, even though they are not always directly translated to each other.

Operations: What you can do with it

Imagine you have a collection of different types of toys, like toy cars, toy planes, and toy animals. Each toy has its own special features and abilities. We can think of each toy as a vector, and the collection of toys represents our vector space.

In this vector space, we can perform two main operations: addition and scalar multiplication. Let's explore these operations using our toy collection:

  • Addition of Toys: When we add two toys together, we combine their features to create a new toy with a blend of the original characteristics. For example, if we combine a toy car and a toy plane, we can create a new toy called a "flying car." This new toy possesses both the ability to fly like a plane and drive like a car. We can think of this addition operation as merging the qualities of different toys.
  • Scalar Multiplication: Scalar multiplication involves scaling a toy by a number. Suppose we have a toy animal, let's say a tiger. If we multiply the tiger's characteristics by a factor of 2, we get a new toy called a "giant tiger." This giant tiger has double the size, strength, and all the features of the original tiger. Scalar multiplication allows us to change the scale or magnitude of a toy's attributes.

Putting it all together: Why is this so important?

Currently, there are limitations on the amount of text that can be provided to a large language model. Furthermore, when we add more text to prompts, the response time of the language model slows down. Processing the additional information and generating an answer takes more time. Additionally, there's a challenge with keeping the model updated with new knowledge. If there is a regularly updated database, the language model cannot keep pace with those updates. Training the model on new information requires extensive GPU training hours, which are both time-consuming and expensive.

Vector databases can provide a managed way of controlling the information presented to a large language model. Unlike the internal memory of the language model, a vector database allows for the addition, deletion, and updating of records, like a normal database. This enables us to keep the information up to date, overcoming the frozen state of the world that the language model was trained on.

The most prominent AI and machine learning-powered products globally are already powered through the application of Vector Space technology. For instance, search engines like Google utilize Vector Search. Similarly, platforms like Amazon rely on it for product recommendations and searches. Even social media giants like Facebook implement personalized feed ranking using the same methodology. TikTok recently announced that they also employ a similar method for their recommendations. It is a method that has proven to be highly effective and is widely used in the most successful recommendation engines, search engines, feed ranking solutions, and similar systems.

Vector spaces have become such an essential concept in AI that they are likely to play a crucial role in solving various problems related to language and data analysis in future automated translation applications. They will improve multilingual AI performance. As AI systems become more complex, they are going to need to be able to handle more data and more tasks. Vector spaces can help AI systems do this by providing a better way to represent and compare data from different languages.

Avant Garde: Vector Technology in CAT Tools

Vector search is proving to be a versatile tool that finds applications in various fields, including the newest computer-assisted translation (CAT) tools. In CAT tools, vector spaces are employed to represent the meaning of words and phrases, as well as the connections between them. This information can enhance translation quality and facilitate other helpful features, such as suggesting alternative translations and creating glossaries.

One notable example of a CAT tool application utilizing vector spaces is NFA (Neural Fuzzy Adaptation) originally developed by Jean Senellart at SYSTRAN. NFA is a?system that’s part of the neural translation engine, specifically the?inference?model,?that employs a neural network to learn the relationships between words and phrases in different languages. The neural network is trained on a large collection of parallel texts, which?can be?texts translated into multiple languages?(such as one source, multiple targets, or the same language but for different regions).

Here is how it operates:

  • The neural network is trained using a large corpus of parallel text, which comprises texts translated into multiple languages.
  • When translating a new text, the neural network first generates a vector representation of the source text, capturing its meaning as a mathematical object.
  • This vector representation is then utilized by the neural network to search for the most likely translation in the target language.
  • The neural network produces a probability distribution of possible translations, enabling the ranking and selection of the most probable translation.

Once the neural network is trained, it can be used to translate new texts. It starts by generating a vector representation of the source text, capturing its meaning as a mathematical object. The neural network then employs this vector representation to search for the most probable translation in the target language.

NFA has demonstrated a promising ability to produce higher-quality translations. According to Systran, NFA was compared to other?neural?machine translation systems and proved to deliver translations that were equal to or better than those produced by other systems

Vector spaces are being used to train the newest AI-powered machine translation models and now you know why! Some of the most advanced CAT tools in the industry like?XTM?cloud, Across, and MemoQ. already have implemented NFA technology?connecting to SYSTRAN Servers for the pre-translation phase.

?


Leonard Arambam

Co-Founder, CEO @Tribeshare

1 年

I haven't heard of Neural Fuzzy Adaptation. Intriguing article Stefan Huyghe

Mattia Fioravanti

Business Development Manager

1 年

Interesting. I'd like to underline however that chien can have a negative connotation when speaking about a person both in French and Italian.

Loie Favre

Content Marketing Leader

1 年

Stefan Huyghe I read your article and found it quite insightful! Spreading knowledge about NFA (Neural Fuzzy Adaptation) will help companies understand how they can utilize AI with (near) peace of mind when it comes to contextuality. ??

John Hayato Branderhorst

Business Development, Strategy, & Marketing Consultant with 15+ years helping clients go global

1 年

As always, some good insights shared. Thanks Stefan Huyghe.

Marina Gracen-Farrell

Global Human Advocate | Reinvention & Age Diversity Champion, Professional Coach in development | ex Pearson Education | LocLunch? Ambassador

1 年

Interesting findings Stefan Huyghe! The challenges are met in many different ways... here's a quote from Gabriel Fairman about how Bureau Works has a new AI feature "Translation Smells", which "uses a variety of Natural Language Processing techniques related to the tokenization and indexation of words, allied with prompt engineering we can feed LLMs the original content and the translated content as well as multiple contextual sources to sniff out potential problems with the translation..." Potential errors that can be "sniffed out" include Unnatural Speech Patterns, Inaccuracies, Potential awkward word choices, Wrong translations (Incorrect meaning), even Inappropriate or offensive content. All of this brings "less focus on error avoidance" and allows the translator to work with an assistant who is always there. I'm not sure of the comparison to Vector Spaces but the tokenization is a similar concept. Boryana Nenova what's your take on this, and Gabriel's article "Something smells fishy in this translation..." https://mergingminds.substack.com/p/something-smells-fishy-in-this-translation

要查看或添加评论,请登录

Stefan Huyghe的更多文章

  • What Does The Future of Localization Tools Hold?

    What Does The Future of Localization Tools Hold?

    Will The Standard TMS Become Obsolete? As AI, automation, and multilingual data orchestration redefine the way…

    20 条评论
  • Breaking Out of The Translation Box

    Breaking Out of The Translation Box

    The State of The Language Industry In recent weeks, I have observed industry heavyweights like Don DePalma and Arle…

    20 条评论
  • How LangOps Can Transform Our Industry Into A Strategic Powerhouse!

    How LangOps Can Transform Our Industry Into A Strategic Powerhouse!

    In this edition of the AI in Loc newsletter, I’m thrilled to welcome Arthur Wetzel, the CEO of the newly established…

    7 条评论
  • Knowledge Graph-Based RAG To Change Localization Forever

    Knowledge Graph-Based RAG To Change Localization Forever

    As the localization industry starts 2025, it’s clear that the technological and strategic shifts we anticipated for…

    37 条评论
  • When Was The Last Time You Googled?

    When Was The Last Time You Googled?

    How Conversational AI is Changing Search and Localization The way we interact with information is changing entirely…

    44 条评论
  • MultiLingual Content Strategies - Redefined

    MultiLingual Content Strategies - Redefined

    Advanced LangOps Insights from the latest Expert Roundup In an era where technology rapidly evolves, localization…

    10 条评论
  • AI-Powered Localization Lessons from Asia

    AI-Powered Localization Lessons from Asia

    Asia is undergoing a quiet revolution in the localization space, driven by economic shifts, technological innovation…

    24 条评论
  • Create More Context-Aware Global Engagement

    Create More Context-Aware Global Engagement

    A Smarter Approach to Managing and Optimizing Global Communication For the last 30 years or so, organizations have…

    23 条评论
  • Content Centralization: The Foundation of Modern Globalization

    Content Centralization: The Foundation of Modern Globalization

    Transforming the Power of Multilingual Assets Beyond Standard TMS The globalization of business has brought with it an…

    46 条评论
  • The End Of Static Localization

    The End Of Static Localization

    Transforming LSPs for a Multilingual World in which Customers Talk Back Have you noticed that many Language Service…

    23 条评论

社区洞察

其他会员也浏览了