登录查看更多内容

postgres.new: a vector search conversation

Simon Epstein

Principal, Emerging Technologies at 67 Bricks

发布日期: 2024年8月13日

I’ve just had an informative conversation with postgres.new, exploring PostgreSQL’s support for word embeddings and vector search. I’ve been meaning to look at this for a while; we use PostgreSQL extensively at 67 Bricks, and we are increasingly building products that leverage embeddings as a search mechanism.? In some scenarios it would be good to be able to use a tool we already understand well to do this, as an alternative to adopting an entirely new technology.

Postgres.new is PostgreSQL in your browser, running through the magic of PGlite and WASM, with visualisations and an LLM powered chat interface supplied by Supabase. I began the conversation with my tried and tested opening gambit:

I am building a system to ingest and tag documents.? I would like to store the following information: title, id, content, tags, embeddings

It helpfully created a documents table with the following schema:

CREATE TABLE documents (
    id bigint primary key generated always as identity,
    title text not null,
    content text not null,
    tags text[],
    embedding vector(384)
);

That's great. Can you create some sample data; please make some of the articles about machine learning, and the others about the tradeoffs in using auto incrementing ids vs uuids for database identifiers.

Title: Introduction to Machine Learning
Content: Machine Learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data, without being explicitly programmed.
Tags: ['machine learning', 'AI']
...
Title: Auto Incrementing IDs in Databases
Content: Auto incrementing IDs are simple to use and understand, but they can lead to contention in distributed systems.
Tags: ['database', 'auto increment']

Title: UUIDs vs Auto Incrementing IDs
Content: UUIDs (Universally Unique Identifiers) provide a unique identifier across systems, but they are larger and can be less performant.
Tags: ['database', 'UUID']

Neat! It generated some plausible tags for me as well.

Please can you show me the top 2 items which are similar to "artificial intelligence". Include a similarity score in the output.

领英推荐

SELECT news FROM Yugabyte - September 24

Yugabyte 5 个月前

SELECT news FROM Yugabyte - July 24

Yugabyte 7 个月前

How to Build GenAI Applications on MySQL Data

Vincent Granville 1 年前

Title: Introduction to Machine Learning
Content: Machine Learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to learn from data, without being explicitly programmed.
Similarity Score: 0.1036

Title: Supervised Learning Explained
Content: Supervised learning is a type of machine learning where the model is trained on labeled data.
Similarity Score: 0.1498

The generated SQL was as follows:

select id, title, content, tags, embedding <= > (
    select embedding
    from meta.embeddings
    where id = 6
  ) as similarity_score
from
  documents
order by
  similarity_score
limit
  2;

As an aside, Postgres.new creates embeddings for queries and stores them in the meta.embeddings table, because they can be quite large and expensive to send back and forth to the database (we went off on a side quest to find that out). Back to the main story:

What does the <=> operator do?

The <=> operator is used to calculate the cosine distance between two vectors in the pgvector extension for PostgreSQL…

It then goes on to define cosine distance, and explains why a lower score means more similar content, which was going to be my next question.

Can you create a chart showing the similarity score vs title for the query “artificial intelligence” ?

So I now have an idea of how to create and query a schema using sentence embeddings in PostgreSQL.? This was all possible without installing any software, reading any documentation or indeed writing any SQL.? Well I didn’t write any SQL.? But before we go:

Please can you change the id column type to uuid.

要查看或添加评论，请登录

Simon Epstein的更多文章

Rapid Prototyping: Data extraction using LLMs

2024年12月13日

Rapid Prototyping: Data extraction using LLMs

At 67 Bricks we use rapid prototyping to validate key architectural choices as early as possible. A number of tools…
Pair programming with Cursor: A modern twist on collaborative coding

2024年11月7日

Pair programming with Cursor: A modern twist on collaborative coding

Imagine having a coding partner available 24/7—tireless, fast, and, while not flawless, always ready to help. With AI…
Balancing Innovation and Skill: How AI is Changing the Way We Code

2024年9月16日

Balancing Innovation and Skill: How AI is Changing the Way We Code

?? We recently held an internal forum at 67 Bricks to explore how AI-powered development tools are impacting our…
ChatGPT rubber duck debugging

2024年7月18日

ChatGPT rubber duck debugging

I am often asked about how we use GenAI to make us more productive as developers at 67 Bricks. It is sometimes…
AI assisted 3D printing

2024年7月9日

AI assisted 3D printing

How Generative AI can help you validate an approach quickly, overcome inertia due to lack of domain knowledge and…

4 条评论

See all articles

postgres.new: a vector search conversation

Simon Epstein

Principal, Emerging Technologies at 67 Bricks

领英推荐

Simon Epstein的更多文章

社区洞察

其他会员也浏览了

Mastering PostgreSQL Full-Text Search Optimization

[Postgres] How to deal with long-running transactions

FlameGraphs and eBPF (or perf) to troubleshoot Postgres performance

[Postgres] How to redefine a PK without downtime

How to import CSV to Postgres

Subquery Pitfalls: Why Your MySQL Query Might Be Slow

PostgreSQL Common Table Expressions (CTEs): A Powerful Tool for Complex Queries

PostgreSQL Performance Tuning for Application Developers: Hands-On Guide to Speed

How to use pg_restore

PostgreSQL's Consistency Without Locks (Desi Edition)

领英推荐

Simon Epstein的更多文章

Rapid Prototyping: Data extraction using LLMs

Pair programming with Cursor: A modern twist on collaborative coding

Balancing Innovation and Skill: How AI is Changing the Way We Code

ChatGPT rubber duck debugging

AI assisted 3D printing

社区洞察

其他会员也浏览了

Mastering PostgreSQL Full-Text Search Optimization

[Postgres] How to deal with long-running transactions

FlameGraphs and eBPF (or perf) to troubleshoot Postgres performance

[Postgres] How to redefine a PK without downtime

How to import CSV to Postgres

Subquery Pitfalls: Why Your MySQL Query Might Be Slow

PostgreSQL Common Table Expressions (CTEs): A Powerful Tool for Complex Queries

PostgreSQL Performance Tuning for Application Developers: Hands-On Guide to Speed

How to use pg_restore

PostgreSQL's Consistency Without Locks (Desi Edition)