Use database cache for mg-context by PaulAdamDavis · Pull Request #1611 · TryGhost/migrate

PaulAdamDavis · 2026-03-08T22:16:54Z

WIP

coderabbitai · 2026-03-08T22:17:17Z

Walkthrough

This pull request introduces persistent storage to the mg-context package via SQLite (in-memory or file-backed), replacing in-memory post arrays with database-backed operations. All major context operations (init, addPost, findPosts, save) become asynchronous. New public methods are added to PostContext, TagContext, and AuthorContext for tag/author management and database persistence. A new database module exports Sequelize-based models for Post, Tag, Author, and their relationships. The writeGhostJson method gains batching support to handle large post collections. Additionally, the mg-json package adds a resetSlugs function to reset slug deduplication state between independent JSON export calls.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description 'WIP' is vague and does not convey meaningful information about the changeset's purpose, scope, or implementation details.	Replace 'WIP' with a brief description of the database integration, initialization steps, and key API changes being introduced.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Use database cache for mg-context' directly summarizes the main objective of this changeset, which introduces persistent SQLite database storage to replace in-memory post/tag/author storage in the mg-context package.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev-mg-context-db

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

packages/mg-context/src/lib/MigrateBase.ts (1)

8-8: Keep dbId internal.

Line 127 deleting dbId again is a strong sign this should not be part of the public instance shape. Please hide it behind private/protected state instead of exposing it on every context object.

As per coding guidelines, **/*.ts: TypeScript classes should use private fields (#fieldName) for encapsulation.

Also applies to: 123-128
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mg-context/src/lib/MigrateBase.ts` at line 8, The public property
dbId on MigrateBase should be made a private field per guidelines: replace the
public declaration with a private class field (use the ECMAScript private field
syntax `#dbId`: number | null = null) and remove any external/public exposure;
update all internal references in this class from this.dbId to this.#dbId and,
if external access is required, add an explicit protected getter/setter or a
named method (e.g., getDbId()/setDbId()) on MigrateBase so callers or subclasses
use that API instead of the public property; ensure the code that currently
deletes dbId (the spot referenced in the review) is updated to use the new
accessor or internal mutation.

packages/mg-context/src/lib/MigrateContext.ts (1)

118-148: Consider extracting post data filtering into a helper to reduce code duplication.

The slug, title, and sourceAttr filter branches follow nearly identical patterns: fetch all posts, parse JSON, filter by a condition. This could be refactored into a shared helper, though it's functional as-is.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/mg-context/src/lib/MigrateContext.ts` around lines 118 - 148, The
findPosts method repeats the same pattern for slug, title and sourceAttr (fetch
all rows via this.db.Post.findAll, JSON.parse row.get(...), filter, and await
PostContext.fromRow) — extract that logic into a helper (e.g., a private method
on MigrateContext like filterPostsByPredicate(predicate: (data, row) =>
boolean): Promise<PostContext[]>) that accepts a predicate to test parsed post
data (and parsed source when needed), then replace the slug/title/sourceAttr
branches to call this helper with simple predicates referencing
data.slug/data.title or source[sourceAttr.key] respectively; keep using
PostContext.fromRow inside the helper to build results.

packages/mg-context/src/lib/PostContext.ts (1)

324-348: Consider batching tag/author fetches to reduce N+1 queries.

The current implementation fetches each tag and author individually in a loop (one query per relation). For posts with many tags/authors, this could be slow.

An optimization would be to collect all tag/author IDs first, then fetch them in a single query using findAll({ where: { id: tagIds } }).

♻️ Optional: Batch fetch tags and authors

         // Load tags via join table
         const postTags = await db.PostTag.findAll({
             where: {post_id: post.dbId},
             order: [['sort_order', 'ASC']]
         });
 
-        for (const pt of postTags) {
-            const tagRow = await db.Tag.findByPk(pt.get('tag_id') as number);
-            if (tagRow) {
-                post.data.tags.push(TagContext.fromRow(tagRow));
-            }
-        }
+        const tagIds = postTags.map(pt => pt.get('tag_id') as number);
+        if (tagIds.length > 0) {
+            const tagRows = await db.Tag.findAll({where: {id: tagIds}});
+            const tagMap = new Map(tagRows.map(r => [r.get('id'), r]));
+            for (const pt of postTags) {
+                const tagRow = tagMap.get(pt.get('tag_id') as number);
+                if (tagRow) {
+                    post.data.tags.push(TagContext.fromRow(tagRow));
+                }
+            }
+        }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/mg-context/src/lib/PostContext.ts` around lines 324 - 348, The
PostContext loading code causes N+1 queries by calling db.Tag.findByPk and
db.Author.findByPk inside loops; instead, collect all tag IDs from postTags and
all author IDs from postAuthors, call db.Tag.findAll and db.Author.findAll once
each to fetch rows in bulk, build maps of id->row, and then iterate the original
postTags/postAuthors (which are already ordered by sort_order) to push
TagContext.fromRow(tagMap[id]) into post.data.tags and
AuthorContext.fromRow(authorMap[id]) into post.data.authors, skipping missing
rows to preserve current behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/mg-context/src/lib/database.ts`:
- Around line 27-44: Update the Tag and Author models to enforce slug uniqueness
(make the slug column or its index unique in the sequelize.define for Tag and
Author) and change TagContext.save and AuthorContext.save to perform an atomic
upsert instead of separate findOne+update: use Model.upsert(...) (or a
transactioned findOrCreate + update) and implement retry-on-unique-violation
logic that catches Sequelize.UniqueConstraintError and retries the upsert once
to handle races; ensure references to the unique constraint match the model
(Tag.slug, Author.slug) so the retry detection is correct.
- Around line 46-56: PostTag and PostAuthor join tables lack indexes on their
foreign-key columns causing full-table scans; update the model definitions for
PostTag and PostAuthor (the sequelize.define calls) to add proper indexes on
post_id and tag_id for PostTag and on post_id and author_id for PostAuthor
(either via the define "indexes" option or by creating individual indexes after
model definition) so queries resolving relations by post_id/tag_id/author_id use
indexed lookups.

In `@packages/mg-json/lib/to-ghost-json/index.js`:
- Around line 8-10: The module-level slug reset (resetSlugs()) is called before
the awaited validate(...) inside the exported async function (the toGhostJSON
entry), allowing concurrent calls to share the slug cache; move the resetSlugs()
invocation so it runs after the final await validate(...) completes (or instead
refactor the slug store to be invocation-scoped and used by the function),
ensuring resetSlugs() is executed post-validation within the exported async
function to prevent cross-call deduplication.

---

Nitpick comments:
In `@packages/mg-context/src/lib/MigrateBase.ts`:
- Line 8: The public property dbId on MigrateBase should be made a private field
per guidelines: replace the public declaration with a private class field (use
the ECMAScript private field syntax `#dbId`: number | null = null) and remove any
external/public exposure; update all internal references in this class from
this.dbId to this.#dbId and, if external access is required, add an explicit
protected getter/setter or a named method (e.g., getDbId()/setDbId()) on
MigrateBase so callers or subclasses use that API instead of the public
property; ensure the code that currently deletes dbId (the spot referenced in
the review) is updated to use the new accessor or internal mutation.

In `@packages/mg-context/src/lib/MigrateContext.ts`:
- Around line 118-148: The findPosts method repeats the same pattern for slug,
title and sourceAttr (fetch all rows via this.db.Post.findAll, JSON.parse
row.get(...), filter, and await PostContext.fromRow) — extract that logic into a
helper (e.g., a private method on MigrateContext like
filterPostsByPredicate(predicate: (data, row) => boolean):
Promise<PostContext[]>) that accepts a predicate to test parsed post data (and
parsed source when needed), then replace the slug/title/sourceAttr branches to
call this helper with simple predicates referencing data.slug/data.title or
source[sourceAttr.key] respectively; keep using PostContext.fromRow inside the
helper to build results.

In `@packages/mg-context/src/lib/PostContext.ts`:
- Around line 324-348: The PostContext loading code causes N+1 queries by
calling db.Tag.findByPk and db.Author.findByPk inside loops; instead, collect
all tag IDs from postTags and all author IDs from postAuthors, call
db.Tag.findAll and db.Author.findAll once each to fetch rows in bulk, build maps
of id->row, and then iterate the original postTags/postAuthors (which are
already ordered by sort_order) to push TagContext.fromRow(tagMap[id]) into
post.data.tags and AuthorContext.fromRow(authorMap[id]) into post.data.authors,
skipping missing rows to preserve current behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 339aeadd-acb4-4c3d-9e98-9a2ee8934cca

📥 Commits

Reviewing files that changed from the base of the PR and between 1a85004 and 9852740.

📒 Files selected for processing (13)

packages/mg-context/README.md
packages/mg-context/package.json
packages/mg-context/src/lib/AuthorContext.ts
packages/mg-context/src/lib/MigrateBase.ts
packages/mg-context/src/lib/MigrateContext.ts
packages/mg-context/src/lib/PostContext.ts
packages/mg-context/src/lib/TagContext.ts
packages/mg-context/src/lib/database.ts
packages/mg-context/src/test/migrate-context.test.ts
packages/mg-context/src/test/tasks.test.ts
packages/mg-json/lib/to-ghost-json/index.js
packages/mg-json/lib/to-ghost-json/meta-to-ghost.js
packages/mg-json/test/to-ghost-json.test.js

coderabbitai · 2026-03-08T22:25:48Z

packages/mg-context/src/lib/database.ts

+    const Tag = sequelize.define('Tag', {
+        data: {type: DataTypes.TEXT, allowNull: false},
+        slug: {type: DataTypes.STRING, allowNull: true},
+        name: {type: DataTypes.STRING, allowNull: true}
+    }, {
+        timestamps: false,
+        indexes: [{fields: ['slug']}, {fields: ['name']}]
+    });
+
+    const Author = sequelize.define('Author', {
+        data: {type: DataTypes.TEXT, allowNull: false},
+        slug: {type: DataTypes.STRING, allowNull: true},
+        name: {type: DataTypes.STRING, allowNull: true},
+        email: {type: DataTypes.STRING, allowNull: true}
+    }, {
+        timestamps: false,
+        indexes: [{fields: ['slug']}, {fields: ['name']}, {fields: ['email']}]
+    });


⚠️ Potential issue | 🟠 Major

Enforce slug uniqueness in the cache tables.

TagContext.save() and AuthorContext.save() both treat slug as the lookup key, but these tables only add non-unique indexes. Once duplicate slugs exist, future findOne({where: {slug}}) calls become nondeterministic and can update the wrong row. Please make slug unique here and switch the save path to an atomic upsert/retry-on-unique-violation flow.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/mg-context/src/lib/database.ts` around lines 27 - 44, Update the Tag and Author models to enforce slug uniqueness (make the slug column or its index unique in the sequelize.define for Tag and Author) and change TagContext.save and AuthorContext.save to perform an atomic upsert instead of separate findOne+update: use Model.upsert(...) (or a transactioned findOrCreate + update) and implement retry-on-unique-violation logic that catches Sequelize.UniqueConstraintError and retries the upsert once to handle races; ensure references to the unique constraint match the model (Tag.slug, Author.slug) so the retry detection is correct.

coderabbitai · 2026-03-08T22:25:48Z

packages/mg-context/src/lib/database.ts

+    const PostTag = sequelize.define('PostTag', {
+        post_id: {type: DataTypes.INTEGER, allowNull: false},
+        tag_id: {type: DataTypes.INTEGER, allowNull: false},
+        sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}
+    }, {timestamps: false});
+
+    const PostAuthor = sequelize.define('PostAuthor', {
+        post_id: {type: DataTypes.INTEGER, allowNull: false},
+        author_id: {type: DataTypes.INTEGER, allowNull: false},
+        sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}
+    }, {timestamps: false});


⚠️ Potential issue | 🟠 Major

Index the join-table foreign keys.

The batched readback path will repeatedly resolve relations by post_id / tag_id / author_id. Without indexes on these columns, PostTag and PostAuthor degrade to full-table scans as the cache grows.

Suggested fix

const PostTag = sequelize.define('PostTag', { post_id: {type: DataTypes.INTEGER, allowNull: false}, tag_id: {type: DataTypes.INTEGER, allowNull: false}, sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} - }, {timestamps: false}); + }, { + timestamps: false, + indexes: [{fields: ['post_id']}, {fields: ['tag_id']}] + }); const PostAuthor = sequelize.define('PostAuthor', { post_id: {type: DataTypes.INTEGER, allowNull: false}, author_id: {type: DataTypes.INTEGER, allowNull: false}, sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} - }, {timestamps: false}); + }, { + timestamps: false, + indexes: [{fields: ['post_id']}, {fields: ['author_id']}] + });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const PostTag = sequelize.define('PostTag', {

post_id: {type: DataTypes.INTEGER, allowNull: false},

tag_id: {type: DataTypes.INTEGER, allowNull: false},

sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}

}, {timestamps: false});

const PostAuthor = sequelize.define('PostAuthor', {

post_id: {type: DataTypes.INTEGER, allowNull: false},

author_id: {type: DataTypes.INTEGER, allowNull: false},

sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}

}, {timestamps: false});

const PostTag = sequelize.define('PostTag', {

post_id: {type: DataTypes.INTEGER, allowNull: false},

tag_id: {type: DataTypes.INTEGER, allowNull: false},

sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}

}, {

timestamps: false,

indexes: [{fields: ['post_id']}, {fields: ['tag_id']}]

});

const PostAuthor = sequelize.define('PostAuthor', {

post_id: {type: DataTypes.INTEGER, allowNull: false},

author_id: {type: DataTypes.INTEGER, allowNull: false},

sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}

}, {

timestamps: false,

indexes: [{fields: ['post_id']}, {fields: ['author_id']}]

});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/mg-context/src/lib/database.ts` around lines 46 - 56, PostTag and PostAuthor join tables lack indexes on their foreign-key columns causing full-table scans; update the model definitions for PostTag and PostAuthor (the sequelize.define calls) to add proper indexes on post_id and tag_id for PostTag and on post_id and author_id for PostAuthor (either via the define "indexes" option or by creating individual indexes after model definition) so queries resolving relations by post_id/tag_id/author_id use indexed lookups.

coderabbitai · 2026-03-08T22:25:48Z

packages/mg-json/lib/to-ghost-json/index.js

 export default async (input, options = {}, ctx) => {
+    // Reset module-level slug deduplication state so each call is independent
+    resetSlugs();


⚠️ Potential issue | 🟠 Major

Move resetSlugs() below the awaited validation.

resetSlugs() runs before the await validate(...) on Line 16, so two concurrent toGhostJSON() calls can still bleed into the same module-level slug cache and cross-deduplicate each other’s output. Reset after the final await in this function, or make the slug store invocation-scoped.

Suggested fix

export default async (input, options = {}, ctx) => { - // Reset module-level slug deduplication state so each call is independent - resetSlugs(); - // Construct a basic Ghost JSON template let output = baseTemplate(); // Validate data and potentially alter if needed input = await validate(input, ctx); + + // Reset after the final await so concurrent calls do not share state + resetSlugs(); // Reorder tags so `#internal-tags` appear last input = reorderTags(input);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export default async (input, options = {}, ctx) => {

// Reset module-level slug deduplication state so each call is independent

resetSlugs();

export default async (input, options = {}, ctx) => {

// Construct a basic Ghost JSON template

let output = baseTemplate();

// Validate data and potentially alter if needed

input = await validate(input, ctx);

// Reset after the final await so concurrent calls do not share state

resetSlugs();

// Reorder tags so `#internal-tags` appear last

input = reorderTags(input);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/mg-json/lib/to-ghost-json/index.js` around lines 8 - 10, The module-level slug reset (resetSlugs()) is called before the awaited validate(...) inside the exported async function (the toGhostJSON entry), allowing concurrent calls to share the slug cache; move the resetSlugs() invocation so it runs after the final await validate(...) completes (or instead refactor the slug store to be invocation-scoped and used by the function), ensuring resetSlugs() is executed post-validation within the exported async function to prevent cross-call deduplication.

PaulAdamDavis added 2 commits March 8, 2026 21:27

First pass

bf10a51

Batch save JSON

9852740

This comment was marked as off-topic.

Sign in to view

coderabbitai bot reviewed Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use database cache for mg-context#1611

Use database cache for mg-context#1611
PaulAdamDavis wants to merge 2 commits intomainfrom
dev-mg-context-db

PaulAdamDavis commented Mar 8, 2026

Uh oh!

This comment was marked as off-topic.

coderabbitai bot commented Mar 8, 2026 •

edited

Loading

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 8, 2026

Uh oh!

coderabbitai bot Mar 8, 2026

Uh oh!

coderabbitai bot Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-export default async (input, options = {}, ctx) => {
-    // Reset module-level slug deduplication state so each call is independent
-    resetSlugs();
+export default async (input, options = {}, ctx) => {
+    // Construct a basic Ghost JSON template
+    let output = baseTemplate();
+    // Validate data and potentially alter if needed
+    input = await validate(input, ctx);
+    // Reset after the final await so concurrent calls do not share state
+    resetSlugs();
+    // Reorder tags so `#internal-tags` appear last
+    input = reorderTags(input);

Uh oh!

Conversation

PaulAdamDavis commented Mar 8, 2026

Uh oh!

This comment was marked as off-topic.

coderabbitai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 8, 2026 •

edited

Loading