-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
Description
I would like to be able to perform similarity search over vector embeddings generated by language models to find records in a table.
Describe the solution you'd like
Postgresql has an extension pgvector which allows easy storage and query over embedding vectors.
It supports:
- exact and approximate nearest neighbor search
- single-precision, half-precision, binary, and sparse vectors
- L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance
- any language with a Postgres client
Sample usage in SQL:
CREATE EXTENSION vector;
-- Create a vector column with 3 dimensions
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
-- Insert Vectors
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
-- Get the nearest neighbors by L2 distance
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
pgvector supports inner product (<#>), cosine distance (<=>), and L1 distance (<+>, added in 0.7.0)
These operations are agnostic of the model which was used to generate these embedding vectors.
Reactions are currently unavailable