Skip to content

Commit f4a7c76

Browse files
techiejddejan-velimirovic-calendlyclaude
authored
feat: DbAdapter API + pg adapter (without CF) (#44)
* WIP WIP WIP WIP Uses mock adapter WIP WIP WIP WIP WIP WIP * fix: ignore node_modules everywhere * Adds split_db_adapter to CI run * feat(cf-adapter): add Cloudflare Vectorize adapter (#28) * feat(cf-adapter): add Cloudflare Vectorize adapter * feat(cf-adapter): enhance Cloudflare Vectorize integration with config-based bindings and add tests * feat(cf-adapter): refactor Cloudflare Vectorize integration to use config-based bindings and update tests * chore: update pnpm-lock.yaml * Preparing for automated pubishes. This one beta will be done by hand but hopefully the rest will be done automatically * Bumps version since we added deleteEmbeddings. Also runs tsc so that we can be sure the whole project compiles * Adds the type check to ci * fixes type check * removes silly double checking on split_db_adapter for push * Adds root pnpm workspace * feat(cf-adapter): update query parameters and method for deleting embeddings in Cloudflare Vectorize integration (#31) * Bumps version to release * Better typings (#34) * Adds better id tracking for deletion and does only one search instead of many for querying (#35) * Deduplicate shared logic across plugin and adapter packages (#36) * Deduplicate shared logic across plugin and adapter packages Extract repeated production patterns (chunk validation, delete embeddings, task slug constants) into shared utilities exported from the root plugin. Consolidate test helpers via vitest path aliases so adapter tests import from the canonical root dev/ copies. Remove CF adapter dead test code (unused utils, constants, helpers). Fix chunkRichText join bug in CF adapter tests (was joining child nodes without spaces). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * cf adapter limitation acknowledgement and more DRY --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * Removes dead code (#37) * Bumps version for rollout * Merge main (#40) * adds should embed (#38) * adds should embed - merged * Merge main into split_db_adapter (beta.5) (#42) * adds should embed (#38) * adds should embed * Ups version to get ready for release * splits the job into one per batch (#41) * splits the job into one per batch * fix: remove waitUntil delay and persist failedChunkData on batch records - Remove 30s waitUntil delay from per-batch task re-queue (was causing test timeouts since the original code had no such delay) - Add failedChunkData JSON field to batch collection so per-batch tasks can store chunk-level failure data independently - Aggregate failedChunkData from batch records in finalizeRunIfComplete() instead of relying on in-memory accumulation from the old single-task flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add batchLimit to CollectionVectorizeOption with coordinator/worker architecture Splits prepare-bulk-embedding into coordinator + per-collection workers. Each worker processes one page of one collection, queuing a continuation job before processing to ensure crash safety. Default batchLimit is 1000 when not explicitly set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rewrite batchLimit test 2 to reuse same Payload instance The second test was creating a separate Payload instance sharing the same DB and job queues, causing two crons to compete for jobs. This led to double-execution and mock state inconsistency (expected 4 to be 2). Now both tests use the single beforeAll instance with cleanup between. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add payload.destroy() in afterAll to prevent OOM from leaked crons Every test file that creates a Payload instance now calls payload.destroy() in afterAll (or try/finally for in-test instances). This stops background cron jobs from accumulating across tests, which was causing heap exhaustion in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Trying to not destroy our heap * Runs tests in parallel now that each test gets its own db * WIP * fix: fix OOM, polling test assertions, and add diagnostic logging - Add --max-old-space-size=8192 to test:int NODE_OPTIONS (cross-env was overriding the CI env var, so the heap limit never took effect) - Fix polling.spec.ts queueSpy assertions: coordinator/worker adds an extra queue call, so poll-or-complete-single-batch is now call 3 and 4 instead of 2 and 3 - Add extensive [vectorize-debug] console.log throughout task handlers (coordinator, worker, poll-single, finalize, streamAndBatchDocs) to diagnose any remaining CI hangs - Remove redundant NODE_OPTIONS from CI workflow (now in the script) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove poll-or-complete-bulk-embedding task and aggregate incrementally Remove the backward-compatible fan-out task since the per-batch architecture hasn't been released yet. Refactor finalizeRunIfComplete to aggregate batch counts incrementally during pagination instead of collecting all batch objects into memory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump to 0.5.5, update changelog, remove debug logging - Bump version 0.5.4 → 0.5.5 - Add 0.5.5 entry to CHANGELOG.md (coordinator/worker, batchLimit, per-batch polling) - Document batchLimit in README CollectionVectorizeOption section - Remove all diagnostic console.log statements from bulkEmbedAll.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adds upgrade note --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version to 0.6.0-beta.5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve 4 CI test failures from merge - chunkers.spec.ts: remove getPayload() call that crashes on dummy db, pass SanitizedConfig directly to chunkRichText - batchLimit.spec.ts: add missing dbAdapter (createMockAdapter) required by split_db_adapter architecture - extensionFieldsVectorSearch.spec.ts: pass adapter as second arg to createVectorSearchHandlers (new signature from split_db_adapter) - versionBump.spec.ts: destroy payload0 before creating payload1 to prevent cron worker race condition between two instances Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Cleans a nit double line * Undoes a weird test fix done by the bot * fix: harden versionBump test with sequential steps and queue isolation - Use test.step() to enforce sequential execution of each phase - Add separate realtimeQueueName per payload instance to prevent cron worker cross-talk on the default queue - Use dynamic Date.now() keys to avoid cached state interference - Increase waitForBulkJobs timeout to 30s for CI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prevent waitForBulkJobs from returning prematurely waitForBulkJobs could return early in the coordinator/worker fan-out pattern when there's a brief window with 0 pending jobs between job transitions. Now it also checks the bulk embeddings run status — only returns when both no pending jobs exist AND no runs are in queued/running state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove test.step() — not available in Vitest test.step() is a Playwright API, not Vitest. Reverted to flat sequential code with phase comments for readability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rewrite versionBump test with single Payload instance Instead of creating two Payload instances (which caused cron cross-talk, timeout, and queue isolation issues on CI), use one instance and mutate the knowledgePools config version between bulk embed runs. Tests the same code path (versionMismatch in streamAndBatchDocs) without the multi-instance fragility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * chore: update beta version * Remove Cloudflare adapter to unblock main branch merge Split the CF adapter work out so the core DbAdapter API and pg adapter can be merged to main independently. The CF adapter will continue in a separate branch. * docs: call for help adding more database adapters * chore: bump version to 0.7.0 --------- Co-authored-by: dejan-velimirovic-calendly <dejan.velimirovic@calendly.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 82ac6a1 commit f4a7c76

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+3767
-1704
lines changed

.changeset/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Changesets
2+
3+
Hello and welcome! This folder has been automatically generated by `@changesets/cli`, a build tool that works
4+
with multi-package repos, or single-package repos to help you version and publish your code. You can
5+
find the full documentation for it [in the readme](https://github.com/changesets/changesets/blob/main/README.md)
6+
7+
## Usage
8+
9+
### Adding a changeset
10+
11+
Run `pnpm changeset` to create a new changeset. You'll be prompted to:
12+
1. Select which packages have changed
13+
2. Choose a bump type (major/minor/patch)
14+
3. Write a summary of the changes
15+
16+
### Versioning
17+
18+
Run `pnpm changeset:version` to consume all pending changesets, bump versions, and update changelogs.
19+
20+
### Publishing
21+
22+
Run `pnpm release` to build and publish all packages to npm.

.changeset/config.json

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"$schema": "https://unpkg.com/@changesets/config@3.1.1/schema.json",
3+
"changelog": [
4+
"@changesets/changelog-github",
5+
{ "repo": "techiejd/payloadcms-vectorize" }
6+
],
7+
"commit": false,
8+
"fixed": [
9+
["payloadcms-vectorize", "@payloadcms-vectorize/pg"]
10+
],
11+
"access": "public",
12+
"baseBranch": "main",
13+
"updateInternalDependencies": "patch",
14+
"ignore": []
15+
}

.github/workflows/ci.yml

Lines changed: 85 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,26 @@ on:
77
branches: [main]
88

99
jobs:
10+
typecheck:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Install pnpm
16+
uses: pnpm/action-setup@v4
17+
18+
- name: Setup Node.js
19+
uses: actions/setup-node@v4
20+
with:
21+
node-version: '20'
22+
cache: 'pnpm'
23+
24+
- name: Install dependencies
25+
run: pnpm install
26+
27+
- name: Type check all packages
28+
run: pnpm build:types:all
29+
1030
test_int:
1131
runs-on: ubuntu-latest
1232

@@ -33,8 +53,8 @@ jobs:
3353
- name: Setup Node.js
3454
uses: actions/setup-node@v4
3555
with:
36-
node-version: "20"
37-
cache: "pnpm"
56+
node-version: '20'
57+
cache: 'pnpm'
3858

3959
- name: Install dependencies
4060
run: pnpm install
@@ -53,6 +73,52 @@ jobs:
5373
IVFFLATLISTS: 1
5474
TEST_ENV: 1
5575

76+
test_adapters_pg:
77+
runs-on: ubuntu-latest
78+
79+
services:
80+
postgres:
81+
image: pgvector/pgvector:pg15
82+
env:
83+
POSTGRES_PASSWORD: password
84+
POSTGRES_DB: payload_test
85+
options: >-
86+
--health-cmd pg_isready
87+
--health-interval 10s
88+
--health-timeout 5s
89+
--health-retries 5
90+
ports:
91+
- 5433:5432
92+
93+
steps:
94+
- uses: actions/checkout@v4
95+
96+
- name: Install pnpm
97+
uses: pnpm/action-setup@v4
98+
99+
- name: Setup Node.js
100+
uses: actions/setup-node@v4
101+
with:
102+
node-version: '20'
103+
cache: 'pnpm'
104+
105+
- name: Install dependencies
106+
run: pnpm install
107+
108+
- name: Install pgvector extension
109+
run: |
110+
sudo apt-get update
111+
sudo apt-get install -y postgresql-client
112+
PGPASSWORD=password psql -h localhost -p 5433 -U postgres -d payload_test -c "CREATE EXTENSION IF NOT EXISTS vector;"
113+
114+
- name: Run pg adapter tests
115+
run: pnpm test:adapters:pg
116+
env:
117+
PAYLOAD_SECRET: test-secret-key
118+
DIMS: 8
119+
IVFFLATLISTS: 1
120+
TEST_ENV: 1
121+
56122
test_e2e:
57123
runs-on: ubuntu-latest
58124

@@ -79,8 +145,8 @@ jobs:
79145
- name: Setup Node.js
80146
uses: actions/setup-node@v4
81147
with:
82-
node-version: "20"
83-
cache: "pnpm"
148+
node-version: '20'
149+
cache: 'pnpm'
84150

85151
- name: Install dependencies
86152
run: pnpm install
@@ -101,3 +167,18 @@ jobs:
101167
DIMS: 8
102168
IVFFLATLISTS: 1
103169
TEST_ENV: 1
170+
171+
test:
172+
runs-on: ubuntu-latest
173+
needs: [typecheck, test_int, test_adapters_pg, test_e2e]
174+
if: always()
175+
steps:
176+
- name: Check required jobs
177+
run: |
178+
if [ "${{ needs.typecheck.result }}" != "success" ] || \
179+
[ "${{ needs.test_int.result }}" != "success" ] || \
180+
[ "${{ needs.test_adapters_pg.result }}" != "success" ] || \
181+
[ "${{ needs.test_e2e.result }}" != "success" ]; then
182+
echo "One or more required jobs failed"
183+
exit 1
184+
fi

.github/workflows/release.yml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
name: Release
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
8+
concurrency: ${{ github.workflow }}-${{ github.ref }}
9+
10+
jobs:
11+
release:
12+
name: Release
13+
runs-on: ubuntu-latest
14+
permissions:
15+
contents: write
16+
pull-requests: write
17+
id-token: write
18+
steps:
19+
- uses: actions/checkout@v4
20+
21+
- name: Install pnpm
22+
uses: pnpm/action-setup@v4
23+
24+
- name: Setup Node.js
25+
uses: actions/setup-node@v4
26+
with:
27+
node-version: '20'
28+
cache: 'pnpm'
29+
registry-url: 'https://registry.npmjs.org'
30+
31+
- name: Install dependencies
32+
run: pnpm install
33+
34+
- name: Create Release Pull Request or Publish
35+
id: changesets
36+
uses: changesets/action@v1
37+
with:
38+
publish: pnpm release
39+
version: pnpm changeset:version
40+
title: 'chore: version packages'
41+
commit: 'chore: version packages'
42+
env:
43+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
44+
NPM_CONFIG_PROVENANCE: true

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
22

33
# dependencies
4-
/node_modules
4+
node_modules/
55
/.pnp
66
.pnp.js
77
.yarn/install-state.gz
@@ -20,6 +20,7 @@
2020
# production
2121
/build
2222
/dist
23+
/adapters/pg/dist
2324

2425
# misc
2526
.DS_Store

CHANGELOG.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,108 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## 0.7.0
6+
7+
### Breaking Changes
8+
9+
- **Database Adapter Architecture**: The plugin now uses a pluggable database adapter system. You must install a database adapter package (e.g., `@payloadcms-vectorize/pg`) separately from the core plugin.
10+
- **`dbAdapter` option required**: The `payloadcmsVectorize()` plugin now requires a `dbAdapter` option pointing to your adapter's implementation.
11+
- **`similarity` renamed to `score`**: The `VectorSearchResult.similarity` field has been renamed to `score` to be more generic across different distance metrics.
12+
13+
### Added
14+
15+
- **`@payloadcms-vectorize/pg` package**: PostgreSQL adapter for pgvector, extracted from the core plugin.
16+
- **`DbAdapter` interface**: New interface for implementing custom database adapters. See `adapters/README.md`.
17+
- **`deleteEmbeddings` on `DbAdapter`**: Adapters can now delete vectors when a document is deleted or re-indexed.
18+
19+
## 0.6.0-beta.5 - 2026-02-25
20+
21+
- Merges main into split_db_adapter (per-batch polling, coordinator/worker architecture, destroyPayload cleanup).
22+
23+
## 0.6.0-beta.4 - 2026-02-20
24+
25+
- Merges main with should embed changes.
26+
27+
## 0.6.0-beta - 2026-02-01
28+
29+
### Breaking Changes
30+
31+
- **Database Adapter Architecture**: The plugin now uses a pluggable database adapter system. You must install a database adapter package (e.g., `@payloadcms-vectorize/pg`) separately from the core plugin.
32+
- **`createVectorizeIntegration` removed from core**: Use the adapter-specific integration factory instead (e.g., `createPostgresVectorIntegration` from `@payloadcms-vectorize/pg`).
33+
- **`dbAdapter` option required**: The `payloadcmsVectorize()` plugin now requires a `dbAdapter` option pointing to your adapter's implementation.
34+
- **`similarity` renamed to `score`**: The `VectorSearchResult.similarity` field has been renamed to `score` to be more generic across different distance metrics.
35+
36+
### Added
37+
38+
- **`@payloadcms-vectorize/pg` package**: PostgreSQL adapter for pgvector, extracted from the core plugin.
39+
- **`DbAdapter` interface**: New interface for implementing custom database adapters. See `adapters/README.md`.
40+
- **`deleteEmbeddings` on `DbAdapter`**: Adapters can now delete vectors when a document is deleted or re-indexed.
41+
- **Adapter documentation**: Added `adapters/README.md` explaining how to create custom adapters.
42+
43+
### Migration
44+
45+
**Before (0.5.x)**
46+
47+
```typescript
48+
import { createVectorizeIntegration } from 'payloadcms-vectorize'
49+
50+
const { afterSchemaInitHook, payloadcmsVectorize } = createVectorizeIntegration({
51+
main: { dims: 1536, ivfflatLists: 100 },
52+
})
53+
54+
export default buildConfig({
55+
db: postgresAdapter({
56+
afterSchemaInit: [afterSchemaInitHook],
57+
}),
58+
plugins: [
59+
payloadcmsVectorize({
60+
knowledgePools: {
61+
main: {
62+
/* ... */
63+
},
64+
},
65+
}),
66+
],
67+
})
68+
```
69+
70+
**After (0.6.0+)**
71+
72+
```typescript
73+
import { createPostgresVectorIntegration } from '@payloadcms-vectorize/pg'
74+
import payloadcmsVectorize from 'payloadcms-vectorize'
75+
76+
const integration = createPostgresVectorIntegration({
77+
main: { dims: 1536, ivfflatLists: 100 },
78+
})
79+
80+
export default buildConfig({
81+
db: postgresAdapter({
82+
afterSchemaInit: [integration.afterSchemaInitHook],
83+
}),
84+
plugins: [
85+
payloadcmsVectorize({
86+
dbAdapter: integration.adapter,
87+
knowledgePools: {
88+
main: {
89+
/* ... */
90+
},
91+
},
92+
}),
93+
],
94+
})
95+
```
96+
97+
**Updating search result handling:**
98+
99+
```typescript
100+
// Before
101+
const score = result.similarity
102+
103+
// After
104+
const score = result.score
105+
```
106+
5107
## 0.5.5 - 2026-02-24
6108

7109
### Added

0 commit comments

Comments
 (0)