Vector Databases Demystified: Part 3 - Build a colour matching app with Pinecone

Vector Databases Demystified: Part 3 - Build a colour matching app with Pinecone

Introduction

In the first two parts of this series, we introduced the concept of vector databases and guided you through building a simple vector database in Python:

Now, let's get into Pinecone , a managed, cloud-native vector database, to efficiently store and search for similar vectors at scale. We'll walk you through setting up Pinecone, indexing data, and performing similarity searches with a practical example.

Here's what we'll be building - an app where the user provides colours (for example, as a css-style hex code) and the database will return the most similar colours it has stored:

No alt text provided for this image
Some examples of the output from this app: in each strip, the top colour is the search query, and the bottom five are the most similar colours found in the database.

Why colours?

We're going to use colours as an example because they can be encoded as a vector - in our case, using something called "Red-Green-Blue" (aka RGB) space. Think of it like this: every colour possible on a computer screen can be made by mixing different amounts of red, green, and blue light. This mix can be represented as a list of three numbers (one for red, one for green, and one for blue). This list is what we call a vector in RGB space. If you're familiar with colour hex codes from html or css, these are another way of writing RGB vectors. So, when we talk about finding similar colours, we'll be using our vector database to look for vectors that have a similar angle to each other in this RGB space.

In practice, Pinecone is more useful for much higher dimensional space - if RGB is in three dimensions (red, green, and blue), more common use cases for Pinecone might use 768-dimensional vectors (we'll get into this in future posts...). But to understand what's going on under the hood, let's start with 3 ??...

Setting Up Pinecone

Before diving into the code, let's make sure you've got Pinecone all set up. You can do this by signing up for a free account at Pinecone's website and then installing the Pinecone Python library. This is how Pinecone and your Python code will talk to each other. Think of it like setting up a phone line between your code and Pinecone's servers.

Now, let's dive into the Python code...

Next, clone the repo from GitHub so you can follow along.

First, we're importing the necessary libraries. "pinecone" is the library to interact with Pinecone, "colour_data" is the data we'll be indexing and searching, and "Colour" and "plot_colours" are helpers I hacked together to help us handle and visualise what we're doing with the colours.

import pinecone
from colour_data import colour_data
from colour import Colour, plot_colours

# Replace this with your Pinecone API key
API_KEY = "<YOUR API KEY HERE>"

new_index_name = "colour-index"

print("initialising Pinecone connection...")
pinecone.init(api_key=API_KEY, environment="us-west1-gcp-free")
print("Pinecone initialised")        

Here, we're starting the Pinecone connection with our unique API_KEY and specifying the server environment.

Next, we're creating a new 3-dimensional index (or database) called "colour-index". To recap, when we say "3-dimensional", we're referring to the fact that each of our colour vectors has three values (one for red, one for green, one for blue). It's like plotting points on a 3D graph.

print(f"Creating 3-dimensional index called '{new_index_name}'...")
pinecone.create_index(new_index_name, dimension=3)
print("Success")        

The next chunk of code simply retrieves and prints all active indexes (or databases) in Pinecone. It's a good way to make sure our "colour-index" was created successfully:

print("Checking Pinecone for active indexes...")
active_indexes = pinecone.list_indexes()
print("Active indexes:")
print(active_indexes)
        


print("Upserting vectors...")
upsert_response = pinecone_index.upsert(
? ? vectors=colour_data,
? ? namespace="colour-namespace"
)
print("Success")        

"Upserting" is a fancy way of saying "insert this data, but if it already exists, then update it". Here, we're inserting our `colour_data` into the index. We're also specifying a "namespace", which is like a sub-folder within our database to help us better organise our data. In this case, we're calling it "colour-namespace".

Now, we're defining a function called "find_similar_colours()". This function will take a colour and search for similar colours in our index. The "top_k" parameter determines how many similar colours we want to find. If "top_k=5", we'll look for the 5 most similar colours. The "query()" function does the heavy lifting here, performing the search against the Pinecone Index:

# Function to search for similar colours
def find_similar_colours(query_colour, top_k=5):
? ? print(f"searching for similar colours to {query_colour.name} ({query_colour.as_hex()})")
? ? print(f"Vector to search: {query_colour.vector}")
? ? query = pinecone_index.query(
? ? ? ? queries=[
? ? ? ? ? ? query_colour.vector
? ? ? ? ? ? ],
? ? ? ? top_k=top_k,
? ? ? ? namespace='colour-namespace',
? ? ? ? include_values=True
? ? ? ? )
? ? return query.results[0].matches        

Finally, let's test our function. We'll create a list of colours, each with a different format (RGB, normalised, and hexadecimal), and using our "find_similar_colours()" function to find and display similar colours to each one. "plot_colours()" is a function that plots the original colour and its similar colours, so we can visually check the results:

# Use the search function
colours_to_test = [
? ? Colour('Light Coral', [240, 128, 128]),
? ? Colour(
? ? ? ? 'Bisque',
? ? ? ? [
? ? ? ? ? ? 1.0,
? ? ? ? ? ? 0.8941176470588236,
? ? ? ? ? ? 0.7686274509803922
? ? ? ? ],
? ? ? ? format = "normalised"
? ? ? ? ),
? ? Colour(
? ? ? ? 'My new hex colour',
? ? ? ? "ae237f",
? ? ? ? format="hex"
? ? ? ? )
]

for colour_to_test in colours_to_test:
? ? similar_colours = find_similar_colours(colour_to_test)
? ? plot_colours(colour_to_test, similar_colours)        
No alt text provided for this image
The Output! ??

And that's it! You've just built a Python app that uses Pinecone to efficiently search for similar colours. Next time, I'll show you how to take this concept and apply it to search for similar images, text, or any other kind of data that can be represented as a vector. Happy coding!

要查看或添加评论,请登录

Adie Kaye的更多文章

社区洞察

其他会员也浏览了