登录查看更多内容

What is text embedding ?

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

发布日期: 2023年4月30日

Did you know Cambridge dictionary works like a language model ? and Each time you lookup a word, it's as if you are obtaining a vector that represents the essence of that word! Let us dive into the fascinating world of semantic search.

Computers and computer languages are deterministic, which is why we store values in databases to facilitate this inherent aspect of computer languages. The IF statement relies on known values to determine the subsequent logic to execute. However, in semantic search or natural language processing, this deterministic nature is absent.

Words do not exist in isolation. The meaning of "I buy an Apple" depends on the context established by previous statements. To comprehend the meaning of words, it is crucial to understand their relationships with one another. To capture these relationships, we use word embeddings, which are multi-dimensional vector representations of words and their associations with other words.

Text embeddings (or text vectors) are generated by trained language models, which are neural networks developed from extensive text corpora. To obtain a text vector, a sentence or text is input into a neural network model, which then outputs the text vector. This vector captures the relationships of words in the corpus. Different models produce distinct text vectors because the training corpora and mechanisms vary.

When using text embeddings, we aim to understand the meaning of words, which is expressed through the use of other English words. This process is similar to looking up a word in a dictionary; when you search for "cat" in the Cambridge Dictionary, it uses other English words to define "cat."

领英推荐

Natural Language Generation

360DigiTMG 1 年前

Introduction to Large Language Models

Blockchain Council 8 个月前

New Open Long-Context LLM; LLMs For Text Analysis;…

Danny Butvinik 1 年前

Cat noun
a?small?animal?with?fur, four?legs, a?tail, and?claws, 
usually?kept?as a?pet?or for?catching?mice
(from Cambridge online)

If you consult the Collins Dictionary, the definition is similar, but it uses a different sequence of English words. In this example, the models are Cambridge and Collins, and the vocabulary definitions are the text vectors, demonstrating the relationship of "cat" with other English words.

Kids learn the about world by using the language (i.e. word relationships) their parents use and computer scientists train AI models to understand our world by letting the NN to discover the relationships with billion words. That maybe some how explains the reasoning power of AI models. My personal view is the secret of AI will likely happen in combining linguistics with mathematics. When we can use math symbols to describe all linguistic theories, then the AI models will be able to comprehend human-centric reality.

OpenAI offers several models and charges a fee for generating text embeddings. There are also open-source models available, with some providing more accurate representations of word meanings and relationships than OpenAI's Ada model.

https://iamnotarobot.substack.com/p/should-you-use-openais-embeddings

要查看或添加评论，请登录

馬Antony 裕杰的更多文章

What happens , if Elon is right?

2025年2月25日

What happens , if Elon is right?

We recently had an internal discussion about the ongoing saga of weekly email work achievement reports in the U.S.
Can you play chess without knowing the rules? LLM can !

2023年8月11日

Can you play chess without knowing the rules? LLM can !

Most people, upon their first interaction with ChatGPT or other large language models (LLMs), are astonished by the…
Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

2023年5月28日

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

Language Learning Models (LLMs) have become adept at performing tasks that require specific content generation, such as…
What is a transformer?

2023年5月8日

What is a transformer?

One reason ChatGPT is so powerful is that it utilizes a new neural network architecture called the Transformer, which…

2 条评论
What is a vector search?

2023年4月23日

What is a vector search?

The purpose of this newsletter is on assisting enterprise tech people transition into AI-era. It is also my learning…
LLMs store data using Vector DB. Why and how ?

2023年4月16日

LLMs store data using Vector DB. Why and how ?

Traditionally, computing has been deterministic, which refers to the inherent consistency, repeatability, and…
An Oracle DBA's journey to Vector DB, LLM and Generative AI

2023年4月13日

An Oracle DBA's journey to Vector DB, LLM and Generative AI

An Oracle DBA's journey to Vector DB, LLM and Generative AI Discovering the Power of #LLMs in Business Applications and…
Which is a more effective phishing tactic against youngsters?

2021年2月16日

Which is a more effective phishing tactic against youngsters?

Scarcity or Authority or Both? #university #students #research When John received an email inviting him to sign an…
4 Key facts about WPA2 attack, not explained in major media

2017年10月18日

4 Key facts about WPA2 attack, not explained in major media

The news about WPA KRACK—Key Reinstallation Attack and its threats are proliferating. As a global connectivity company…
First 90 days being a startup founder

2016年4月11日

First 90 days being a startup founder

In December 2015, I left my job at Singapore IDA, where I had spent two years, and a month later started my own firm…

10 条评论

See all articles

What is text embedding ?

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

领英推荐

馬Antony 裕杰的更多文章

社区洞察

其他会员也浏览了

Top LLM Papers of the Week (July Week-1 2024)

The Origination of Eight Major Methods For FineTuning an LLM

A philosophical perspective! Large Language Models can lead to general intelligence.

The Technology Behind Large Language Models: Harnessing the Mathematical Elegance of Tamil

Everything about LLM Hallucinations

Evaluating Large Language Models (LLMs): A Standard Set of Metrics for Accurate Assessment

How Irrelevant Retrieval Leads to Hallucination in RAG Models

A Guide to Training Your Own Language Model

Finetuning Large Language Models: A Comprehensive Guide

领英推荐

馬Antony 裕杰的更多文章

What happens , if Elon is right?

Can you play chess without knowing the rules? LLM can !

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

What is a transformer?

What is a vector search?

LLMs store data using Vector DB. Why and how ?

An Oracle DBA's journey to Vector DB, LLM and Generative AI

Which is a more effective phishing tactic against youngsters?

4 Key facts about WPA2 attack, not explained in major media

First 90 days being a startup founder

社区洞察

其他会员也浏览了

Top LLM Papers of the Week (July Week-1 2024)

The Origination of Eight Major Methods For FineTuning an LLM

A philosophical perspective! Large Language Models can lead to general intelligence.

The Technology Behind Large Language Models: Harnessing the Mathematical Elegance of Tamil

Everything about LLM Hallucinations

Evaluating Large Language Models (LLMs): A Standard Set of Metrics for Accurate Assessment

How Irrelevant Retrieval Leads to Hallucination in RAG Models

A Guide to Training Your Own Language Model

Finetuning Large Language Models: A Comprehensive Guide