Vector columns
Supabase offers a number of different ways to store and query vectors within Postgres. If you prefer to use Python to store and query your vectors using collections, see Managing collections. If you want more control over vectors within your own Postgres tables or would like to interact with them using a different language like JavaScript, keep reading.
Vectors in Supabase are enabled via pgvector, a PostgreSQL extension for storing and querying vectors in Postgres. It can be used to store embeddings.
Usage#
Enable the extension#
- Go to the Database page in the Dashboard.
- Click on Extensions in the sidebar.
- Search for "vector" and enable the extension.
Create a table to store vectors#
After enabling the vector
extension, you will get access to a new data type called vector
. The size of the vector (indicated in parenthesis) represents the number of dimensions stored in that vector.
_10create table documents (_10 id serial primary key,_10 title text not null,_10 body text not null,_10 embedding vector(1536)_10);
In the above SQL snippet, we create a documents
table with a column called embedding
(note this is just a regular Postgres column - you can name it whatever you like). We give the embedding
column a vector
data type with 1536 dimensions. Change this to the number of dimensions used in your vector application. For example, if you are generating embeddings using OpenAI's text-embeddings-ada-002
model, you would set this number as 1536 since that model produces 1536 dimensions.
Storing a vector / embedding#
In this example we'll generate a vector using the OpenAI API client, then store it in the database using the Supabase JavaScript client.
_17const title = 'First post!'_17const body = 'Hello world!'_17_17// Generate a vector using OpenAI_17const embeddingResponse = await openai.createEmbedding({_17 model: 'text-embedding-ada-002',_17 input: body,_17})_17_17const [{ embedding }] = embeddingResponse.data.data_17_17// Store the vector in Postgres_17const { data, error } = await supabase.from('documents').insert({_17 title,_17 body,_17 embedding,_17})
This example uses the JavaScript Supabase client, but you can modify it to work with any supported language library.
Querying a vector / embedding#
Similarity search is the most common use case for vectors. pgvector
support 3 new operators for performing similarity search:
Operator | Description |
---|---|
<-> | Euclidean distance |
<#> | negative inner product |
<=> | cosine distance |
Choosing the right operator depends on your needs. If you are searching over OpenAI embeddings, OpenAI recommends using cosine similarity. For more information on how embeddings work and how they relate to each other, see What are Embeddings?.
Supabase client libraries like supabase-js
connect to your Postgres instance via PostgREST. PostgREST does not currently support pgvector
similarity operators, so we'll need to wrap our query in a Postgres function and call it via the rpc()
method:
_21create or replace function match_documents (_21 query_embedding vector(1536),_21 match_threshold float,_21 match_count int_21)_21returns table (_21 id bigint,_21 content text,_21 similarity float_21)_21language sql stable_21as $$_21 select_21 documents.id,_21 documents.content,_21 1 - (documents.embedding <=> query_embedding) as similarity_21 from documents_21 where 1 - (documents.embedding <=> query_embedding) > match_threshold_21 order by similarity desc_21 limit match_count;_21$$;
This function takes a query_embedding
argument and compares it to all other embeddings in the documents
table. Each comparison returns a similarity score. If the similarity is greater than the match_threshold
argument, it is returned. The number of rows returned is limited by the match_count
argument.
Feel free to modify this method to fit the needs of your application. The match_threshold
ensures that only documents that have a minimum similarity to the query_embedding
are returned. Without this, you may end up returning documents that subjectively don't match. This value will vary for each application - you will need to perform your own testing to determine the threshold that makes sense for your app.
To execute the function from your client library, call rpc()
with the name of your Postgres function:
_10const { data: documents } = await supabaseClient.rpc('match_documents', {_10 query_embedding: embedding, // Pass the embedding you want to compare_10 match_threshold: 0.78, // Choose an appropriate threshold for your data_10 match_count: 10, // Choose the number of matches_10})
In this example embedding
would be another embedding you wish to compare against your table of pre-generated embedding documents. For example if you were building a search engine, every time the user submits their query you would first generate an embedding on the search query itself (using openai.createEmbedding()
), then pass it into the above rpc()
function to match.
Vectors and embedding can be used for much more than search. Learn more about embeddings at What are Embeddings?.
Indexes#
Once your vector table starts to grow, you will likely want to add an index to speed up queries. See Managing indexes to learn how vector indexes work and how to create them.