This project is managed by devenv. 1 It consists of a Next.js 2 application and a Postgres database with the pgvector 3 extension installed. The query uses the L2 distance <-> to calcualate similarity.
The movie dataset 4 is fetched from Kaggle with the download script from devenv. To download the dataset, you need to set KAGGLE_USERNAME and KAGGLE_API_KEY environment variables.
The seed script inserts the data with generated gte-small 5 embeddings into the database. It takes about 17 minutes on my M2 MacBook Air and the database size is around 200 MB.