Home

API

vecs is a python client for managing and querying vector stores in PostgreSQL with the pgvector extension. This guide will help you get started with using vecs.

If you don't have a Postgres database with the pgvector ready, see hosting for easy options.

Installation#

Requires:

  • Python 3.7+

You can install vecs using pip:


_10
pip install vecs

Usage#

Connecting#

Before you can interact with vecs, create the client to communicate with Postgres. If you haven't started a Postgres instance yet, see hosting.


_10
import vecs
_10
_10
DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"
_10
_10
# create vector store client
_10
vx = vecs.create_client(DB_CONNECTION)

Create collection#

You can create a collection to store vectors specifying the collections name and the number of dimensions in the vectors you intend to store.


_10
docs = vx.create_collection(name="docs", dimension=3)

note

If another collection exists with the same name, use get_collection to retrieve it.

Get an existing collection#

To access a previously created collection, use get_collection to retrieve it by name


_10
docs = vx.get_collection(name="docs")

Upserting vectors#

vecs combines the concepts of "insert" and "update" into "upsert". Upserting records adds them to the collection if the id is not present, or updates the existing record if the id does exist.


_15
# add records to the collection
_15
docs.upsert(
_15
vectors=[
_15
(
_15
"vec0", # the vector's identifier
_15
[0.1, 0.2, 0.3], # the vector. list or np.array
_15
{"year": 1973} # associated metadata
_15
),
_15
(
_15
"vec1",
_15
[0.7, 0.8, 0.9],
_15
{"year": 2012}
_15
)
_15
]
_15
)

Create an index#

Collections can be queried immediately after being created. However, for good performance, the collection should be indexed after records have been upserted.

Indexes should be created after the collection has been populated with records. Building an index on an empty collection will result in significantly reduced recall. Once the index has been created you can still upsert new documents into the collection but you should rebuild the index if the size of the collection more than doubles.

Only one index may exist per-collection. By default, creating an index will replace any existing index.

To create an index:


_10
##
_10
# INSERT RECORDS HERE
_10
##
_10
_10
# index the collection to be queried by cosine distance
_10
docs.create_index(measure=vecs.IndexMeasure.cosine_distance)

Available options for query measure are:

  • vecs.IndexMeasure.cosine_distance
  • vecs.IndexMeasure.l2_distance
  • vecs.IndexMeasure.max_inner_product

which correspond to different methods for comparing query vectors to the vectors in the database.

If you aren't sure which to use, stick with the default (cosine_distance) by omitting the parameter i.e.


_10
docs.create_index()

note

The time required to create an index grows with the number of records and size of vectors. For a few thousand records expect sub-minute a response in under a minute. It may take a few minutes for larger collections.

Query#

Given a collection docs with several records:

Basic#

The simplest form of search is to provide a query vector.

note

Indexes are essential for good performance. See creating an index for more info.

If you do not create an index, every query will return a warning


_10
query does not have a covering index for cosine_similarity. See Collection.create_index

that incldues the IndexMeasure you should index.


_10
docs.query(
_10
query_vector=[0.4,0.5,0.6], # required
_10
limit=5, # number of records to return
_10
filters={}, # metadata filters
_10
measure="cosine_distance", # distance measure to use
_10
include_value=False, # should distance measure values be returned?
_10
include_metadata=False, # should record metadata be returned?
_10
)

Which returns a list of vector record ids.

Metadata Filtering#

The metadata that is associated with each record can also be filtered during a query.

As an example, {"year": {"$eq": 2005}} filters a year metadata key to be equal to 2005

In context:


_10
docs.query(
_10
query_vector=[0.4,0.5,0.6],
_10
filters={"year": {"$eq": 2012}}, # metadata filters
_10
)

For a complete reference, see the metadata guide.

Disconnect#

When you're done with a collection, be sure to disconnect the client from the database.


_10
vx.disconnect()

alternatively, use the client as a context manager and it will automatically close the connection on exit.


_10
import vecs
_10
_10
DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>"
_10
_10
# create vector store client
_10
with vecs.create_client(DB_CONNECTION) as vx:
_10
# do some work here
_10
pass
_10
_10
# connections are now closed