API
vecs
is a python client for managing and querying vector stores in PostgreSQL with the pgvector extension. This guide will help you get started with using vecs.
If you don't have a Postgres database with the pgvector ready, see hosting for easy options.
Installation#
Requires:
- Python 3.7+
You can install vecs using pip:
_10pip install vecs
Usage#
Connecting#
Before you can interact with vecs, create the client to communicate with Postgres. If you haven't started a Postgres instance yet, see hosting.
_10import vecs_10_10DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"_10_10# create vector store client_10vx = vecs.create_client(DB_CONNECTION)
Create collection#
You can create a collection to store vectors specifying the collections name and the number of dimensions in the vectors you intend to store.
_10docs = vx.create_collection(name="docs", dimension=3)
note
Get an existing collection#
To access a previously created collection, use get_collection
to retrieve it by name
_10docs = vx.get_collection(name="docs")
Upserting vectors#
vecs
combines the concepts of "insert" and "update" into "upsert". Upserting records adds them to the collection if the id
is not present, or updates the existing record if the id
does exist.
_15# add records to the collection_15docs.upsert(_15 vectors=[_15 (_15 "vec0", # the vector's identifier_15 [0.1, 0.2, 0.3], # the vector. list or np.array_15 {"year": 1973} # associated metadata_15 ),_15 (_15 "vec1",_15 [0.7, 0.8, 0.9],_15 {"year": 2012}_15 )_15 ]_15)
Create an index#
Collections can be queried immediately after being created. However, for good performance, the collection should be indexed after records have been upserted.
Indexes should be created after the collection has been populated with records. Building an index on an empty collection will result in significantly reduced recall. Once the index has been created you can still upsert new documents into the collection but you should rebuild the index if the size of the collection more than doubles.
Only one index may exist per-collection. By default, creating an index will replace any existing index.
To create an index:
_10##_10# INSERT RECORDS HERE_10##_10_10# index the collection to be queried by cosine distance_10docs.create_index(measure=vecs.IndexMeasure.cosine_distance)
Available options for query measure
are:
vecs.IndexMeasure.cosine_distance
vecs.IndexMeasure.l2_distance
vecs.IndexMeasure.max_inner_product
which correspond to different methods for comparing query vectors to the vectors in the database.
If you aren't sure which to use, stick with the default (cosine_distance) by omitting the parameter i.e.
_10docs.create_index()
note
Query#
Given a collection docs
with several records:
Basic#
The simplest form of search is to provide a query vector.
note
If you do not create an index, every query will return a warning
_10query does not have a covering index for cosine_similarity. See Collection.create_index
that incldues the IndexMeasure
you should index.
_10docs.query(_10 query_vector=[0.4,0.5,0.6], # required_10 limit=5, # number of records to return_10 filters={}, # metadata filters_10 measure="cosine_distance", # distance measure to use_10 include_value=False, # should distance measure values be returned?_10 include_metadata=False, # should record metadata be returned?_10)
Which returns a list of vector record ids
.
Metadata Filtering#
The metadata that is associated with each record can also be filtered during a query.
As an example, {"year": {"$eq": 2005}}
filters a year
metadata key to be equal to 2005
In context:
_10docs.query(_10 query_vector=[0.4,0.5,0.6],_10 filters={"year": {"$eq": 2012}}, # metadata filters_10)
For a complete reference, see the metadata guide.
Disconnect#
When you're done with a collection, be sure to disconnect the client from the database.
_10vx.disconnect()
alternatively, use the client as a context manager and it will automatically close the connection on exit.
_10import vecs_10_10DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"_10_10# create vector store client_10with vecs.create_client(DB_CONNECTION) as vx:_10 # do some work here_10 pass_10_10# connections are now closed