Home

Choosing Instance Type

Choosing the right instance type for your workload.

This guide will help you choose the right instance type for your workload. We'll provide general guidance, as it is impossible to provide specific instructions for every possible use case. The goal is to give you a starting point from which you can make your own benchmarks and optimizations.

For more information about engineering at scale, see our Engineering for Scale guide.

Simple workloads#

We've run a set of benchmarks using the gist-960-angular dataset. This dataset contains 1,000,000 embeddings for images, with each embedding being 960 dimensions.

We used Vecs to create a collection, upload the embeddings to a single table, and create an inner-product index for the embedding column. We then ran a series of queries to measure the performance of different instance types:

Results#

The number of vectors in gist-960-angular was cut to fit the instance size.

PlanCPUMemoryVectorsRPSLatency MeanLatency p95CPU Usage - Max %Memory Usage - Max
Free2-core1 GB30,000750.065 sec0.088 sec90%1 GB + 100 Mb Swap
Small2-core2 GB100,000780.064 sec0.092 sec80%1.8 GB
Medium2-core4 GB250,000580.085 sec0.129 sec90%3.2 GB
Large2-core8 GB500,000550.088 sec0.140 sec90%5 GB

The full number of vectors in gist-960-angular dataset - 1,000,000.

PlanCPUMemoryVectorsRPSLatency MeanLatency p95CPU Usage - Max %Memory Usage - Max
XL4-core16 GB1,000,0001100.046 sec0.070 sec45%14 GB
2XL8-core32 GB1,000,0002350.083 sec0.136 sec33%10 GB
4XL16-core64 GB1,000,0004200.071 sec0.106 sec45%11 GB
8XL32-core128 GB1,000,0008150.072 sec0.106 sec75%13 GB
12XL48-core192 GB1,000,00011500.052 sec0.078 sec70%15.5 GB
16XL64-core256 GB1,000,00013450.072 sec0.106 sec60%17.5 GB
  • Lists set to Number of vectors / 1000
  • Probes set to 10

note

It is possible to upload more than 1,000,000 vectors to a single table if Memory allows it (for example, 2XL instance and higher). But it will affect the performance of the queries: RPS will be lower, and latency will be higher. Scaling should be almost linear, but it is recommended to benchmark your workload to find the optimal number of vectors per table and per instance.

Methodology#

We follow techniques outlined in the ANN Benchmarks methodology. A Python test runner is responsible for uploading the data, creating the index, and running the queries. The pgvector engine is implemented using vecs, a Python client for pgvector.

multi database

Each test is run for a minimum of 30-40 minutes. They include a series of experiments executed at different concurrency levels to measure the engine's performance under different load types. The results are then averaged.

As a general recommendation, we suggest using a concurrency level of 5 or more for most workloads and 30 or more for high-load workloads.

Future benchmarks#

We'll continue to add more benchmarks on datasets consisting of different vector dimensions, number of lists in the index, and number of probes in the index. Stay tuned for more information about how it may affect the performance and precision of your queries.