Managing indexes

Once your vector table starts to grow, you will likely want to add an index to speed up queries. Without indexes, you'll be performing a sequential scan which can be a resource-intensive operation when you have many records.

IVFFlat indexes#

Today pgvector indexes use an algorithm called IVFFlat. IVF stands for 'inverted file indexes'. It works by clustering your vectors in order to reduce the similarity search scope. Rather than comparing a vector to every other vector, the vector is only compared against vectors within the same cell cluster (or nearby clusters, depending on your configuration).

Inverted lists (cell clusters)#

When you create the index, you choose the number of inverted lists (cell clusters). Increase this number to speed up queries, but at the expense of recall.

For example, to create an index with 100 lists on a column that uses the cosine operator:

create index on items using ivfflat (column_name vector_cosine_ops) with (lists = 100);

For more info on the different operators, see Distance operations.

For every query, you can set the number of probes (1 by default). The number of probes corresponds to the number of nearby cells to probe for a match. Increase this for better recall at the expense of speed.

To set the number of probes for the duration of the session run:

set ivfflat.probes = 10;

To set the number of probes only for the current transaction run:

begin;
set local ivfflat.probes = 10;
select ...
commit;

If the number of probes is the same as the number of lists, exact nearest neighbor search will be performed and the planner won't use the index.

Approximate nearest neighbor#

One important note with IVF indexes is that nearest neighbor search is approximate, since exact search on high dimensional data can't be indexed efficiently. This means that similarity results will change (slightly) after you add an index (trading recall for speed).

Distance operators#

The type of index required depends on the distance operator you are using. pgvector includes 3 distance operators:

Operator	Description	Operator class
`<->`	Euclidean distance	`vector_l2_ops`
`<#>`	negative inner product	`vector_ip_ops`
`<=>`	cosine distance	`vector_cosine_ops`

Use the following SQL commands to create an index for the operator(s) used in your queries.

Euclidean L2 distance (`vector_l2_ops`)#

create index on items using ivfflat (column_name vector_l2_ops) with (lists = 100);

Inner product (`vector_ip_ops`)#

create index on items using ivfflat (column_name vector_ip_ops) with (lists = 100);

Cosine distance (`vector_cosine_ops`)#

create index on items using ivfflat (column_name vector_cosine_ops) with (lists = 100);

Currently vectors with up to 2,000 dimensions can be indexed.

If you are using the vecs Python library, follow the instructions in Managing collections to create indexes.

When should you add indexes?#

pgvector recommends adding indexes only after the table has sufficient data, so that the internal IVFFlat cell clusters are based on your data's distribution. Anytime the distribution changes significantly, consider recreating indexes.

Resources#

Read more about indexing on pgvector's GitHub page.

Edit this page on GitHub