Skip to content

Commit be507c2

Browse files
committed
[DOC] Update single node performance benchmarks
1 parent d641cde commit be507c2

File tree

7 files changed

+17
-33
lines changed

7 files changed

+17
-33
lines changed

docs/docs.trychroma.com/markdoc/content/production/administration/performance.md

Lines changed: 17 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,14 @@ Roughly speaking, here is the sort of performance you can expect from Chroma on
1515
- Small documents (100-200 words)
1616
- Three metadata fields per record.
1717

18-
| Instance Type | System RAM | Approx. Max Collection Size | Mean Latency (insert) | 99.9% Latency (insert) | Mean Latency (query) | 99.9% Latency (query) | Monthly Cost |
18+
| Instance Type | System RAM | Approx. Max Collection Size | Mean Latency (insert, batch size=32) | 99.9% Latency (insert, batch size=32) | Mean Latency (query) | 99.9% Latency (query) | Monthly Cost |
1919
|-----------------|------------|-----------------------------|-----------------------|------------------------|----------------------|-----------------------|--------------|
20-
| **t3.small** | 2 | 250,000 | 55ms | 250ms | 22ms | 72ms | $15.936 |
21-
| **t3.medium** | 4 | 700,000 | 37ms | 120ms | 14ms | 41ms | $31.072 |
22-
| **t3.large** | 8 | 1,700,000 | 30ms | 100ms | 13ms | 35ms | $61.344 |
23-
| **t3.xlarge** | 16 | 3,600,000 | 30ms | 100ms | 13ms | 30ms | $121.888 |
24-
| **t3.2xlarge** | 32 | 7,500,000 | 30ms | 100ms | 13ms | 30ms | $242.976 |
25-
| **r7i.2xlarge** | 64 | 15,000,000 | 13ms | 50ms | 7ms | 13ms | $386.944 |
20+
| **t3.small** | 2 | 250,000 | 231ms | 1280ms | 8ms | 29ms | $15.936 |
21+
| **t3.medium** | 4 | 700,000 | 191ms | 722ms | 5ms | 18ms | $31.072 |
22+
| **t3.large** | 8 | 1,700,000 | 199ms | 633ms | 4ms | 10ms | $61.344 |
23+
| **t3.xlarge** | 16 | 3,600,000 | 159ms | 530ms | 4ms | 7ms | $121.888 |
24+
| **t3.2xlarge** | 32 | 7,500,000 | 149ms | 520ms | 5ms | 33ms | $242.976 |
25+
| **r7i.2xlarge** | 64 | 15,000,000 | 112ms | 405ms | 5ms | 7ms | $386.944 |
2626

2727
{% br %}{% /br %}
2828

@@ -60,63 +60,47 @@ In most realistic use cases, it’s likely that the size and performance of the
6060

6161
## Latency and collection size
6262

63-
As collections get larger and the size of the index grows, inserts and queries both take longer to complete. The rate of increase starts out fairly flat then grow roughly linearly, with the inflection point and slope depending on the quantity and speed of CPUs available.
63+
As collections get larger and the size of the index grows, inserts and queries both take longer to complete. The rate of increase starts out fairly flat then grow roughly linearly, with the inflection point and slope depending on the quantity and speed of CPUs available. The extreme spikes at the end of the charts for certain instances, such as `t3.2xlarge`, occur when the instance hits its memory limits and stops functioning properly.
6464

6565
### Query Latency
6666

67-
![query-latency](/query-latency.png)
67+
![query-latency](/query_latency_1_0_10.png)
6868

6969
### Insert Latency
7070

71-
![insert-latency](/insert-latency.png)
71+
![insert-latency](/insert_latency_1_0_10.png)
7272

7373
{% note type="tip" title="" %}
7474
If you’re using multiple collections, performance looks quite similar, based on the total number of embeddings across collections. Splitting collections into multiple smaller collections doesn’t help, but it doesn’t hurt, either, as long as they all fit in memory at once.
7575
{% /note %}
7676

7777
## Concurrency
7878

79-
Although aspects of HNSW’s algorithm are multithreaded internally, only one thread can read or write to a given index at a time. For the most part, single-node Chroma is fundamentally single threaded. If an operation is executed while another is still in progress, it blocks until the first one is complete.
79+
The system can handle concurrent operations in parallel, so latency remains consistently low and flat across all batch sizes for writes, and scales linearly for queries.
8080

81-
This means that under concurrent load, the average latency of each request will increase.
81+
![concurrent-writes](/concurrent_writes_1_0_10.png)
8282

83-
When writing, the increased latency is more pronounced with larger batch sizes, as the system is more completely saturated. We have experimentally verified this: as the number of concurrent writers is increased, average latency increases linearly.
84-
85-
![concurrent-writes](/concurrent-writes.png)
86-
87-
![concurrent-queries](/concurrent-queries.png)
88-
89-
Despite the effect on latency, Chroma does remain stable with high concurrent load. Too many concurrent users can eventually increase latency to the point where the system does not perform acceptably, but this typically only happens with larger batch sizes. As the above graphs shows, the system remains usable with dozens to hundreds of operations per second.
83+
![concurrent-queries](/concurrent_queries_1_0_10.png)
9084

9185
See the [Insert Throughput](./performance#insert-throughput) section below for a discussion of optimizing user count for maximum throughput when the concurrency is under your control, such as when inserting bulk data.
9286

9387
# CPU speed, core count & type
9488

95-
As a CPU bound application, it’s not surprising that CPU speed and type makes a difference for average latency.
89+
Due to Chroma's parallelization, latencies remain fairly constant regardless of CPU cores.
9690

97-
As the data demonstrates, although it is not fully parallelized, Chroma can still take some advantage of multiple CPU cores for better throughput.
98-
99-
![cpu-mean-query-latency](/cpu-mean-query-latency.png)
100-
101-
{% note type="tip" title="" %}
102-
Note the slightly increased latency for the t3.2xlarge instance. Logically, it should be faster than the other t3 series instances, since it has the same class of CPU, and more of them.
103-
104-
This data point is left in as an important reminder that the performance of EC2 instances is slightly variable, and it’s entirely possible to end up with an instance that has performance differences for no discernible reason.
105-
{% /note %}
91+
![cpu-mean-query-latency](/cpu_mean_query_latency_1_0_10.png)
10692

10793
# Insert Throughput
10894

10995
A question that is often relevant is: given bulk data to insert, how fast is it possible to do so, and what’s the best way to insert a lot of data quickly?
11096

11197
The first important factor to consider is the number of concurrent insert requests.
11298

113-
As mentioned in the [Concurrency](./performance#concurrency) section above, actual insertion throughput does not benefit from concurrency. However, there is some amount of network and HTTP overhead which can be parallelized. Therefore, to saturate Chroma while keeping latencies as low as possible, we recommend 2 concurrent client processes or threads inserting as fast as possible.
114-
115-
The second factor to consider is the batch size of each request. Performance is mostly linear with respect to batch size, with a constant overhead to process the HTTP request itself.
99+
As mentioned in the [Concurrency](./performance#concurrency) section above, insert throughput does benefit from increased concurrency. A second factor to consider is the batch size of each request. Performance scales with batch size up to CPU saturation, due to high overhead cost for smaller batch sizes. After reaching CPU saturation, around a batch size of 150 the throughput plateaus.
116100

117101
Experimentation confirms this: overall throughput (total number of embeddings inserted, across batch size and request count) remains fairly flat between batch sizes of 100-500:
118102

119-
![concurrent-inserts](/concurrent-inserts.png)
103+
![concurrent-inserts](/concurrent_inserts_1_0_10.png)
120104

121105
Given that smaller batches have lower, more consistent latency and are less likely to lead to timeout errors, we recommend batches on the smaller side of this curve: anything between 50 and 250 is a reasonable choice.
122106

Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)