Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
WalkthroughThis pull request introduces persistent storage to the mg-context package via SQLite (in-memory or file-backed), replacing in-memory post arrays with database-backed operations. All major context operations (init, addPost, findPosts, save) become asynchronous. New public methods are added to PostContext, TagContext, and AuthorContext for tag/author management and database persistence. A new database module exports Sequelize-based models for Post, Tag, Author, and their relationships. The writeGhostJson method gains batching support to handle large post collections. Additionally, the mg-json package adds a resetSlugs function to reset slug deduplication state between independent JSON export calls. Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
packages/mg-context/src/lib/MigrateBase.ts (1)
8-8: KeepdbIdinternal.Line 127 deleting
dbIdagain is a strong sign this should not be part of the public instance shape. Please hide it behind private/protected state instead of exposing it on every context object.As per coding guidelines,
**/*.ts: TypeScript classes should use private fields (#fieldName) for encapsulation.Also applies to: 123-128
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/mg-context/src/lib/MigrateBase.ts` at line 8, The public property dbId on MigrateBase should be made a private field per guidelines: replace the public declaration with a private class field (use the ECMAScript private field syntax `#dbId`: number | null = null) and remove any external/public exposure; update all internal references in this class from this.dbId to this.#dbId and, if external access is required, add an explicit protected getter/setter or a named method (e.g., getDbId()/setDbId()) on MigrateBase so callers or subclasses use that API instead of the public property; ensure the code that currently deletes dbId (the spot referenced in the review) is updated to use the new accessor or internal mutation.packages/mg-context/src/lib/MigrateContext.ts (1)
118-148: Consider extracting post data filtering into a helper to reduce code duplication.The slug, title, and sourceAttr filter branches follow nearly identical patterns: fetch all posts, parse JSON, filter by a condition. This could be refactored into a shared helper, though it's functional as-is.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/mg-context/src/lib/MigrateContext.ts` around lines 118 - 148, The findPosts method repeats the same pattern for slug, title and sourceAttr (fetch all rows via this.db.Post.findAll, JSON.parse row.get(...), filter, and await PostContext.fromRow) — extract that logic into a helper (e.g., a private method on MigrateContext like filterPostsByPredicate(predicate: (data, row) => boolean): Promise<PostContext[]>) that accepts a predicate to test parsed post data (and parsed source when needed), then replace the slug/title/sourceAttr branches to call this helper with simple predicates referencing data.slug/data.title or source[sourceAttr.key] respectively; keep using PostContext.fromRow inside the helper to build results.packages/mg-context/src/lib/PostContext.ts (1)
324-348: Consider batching tag/author fetches to reduce N+1 queries.The current implementation fetches each tag and author individually in a loop (one query per relation). For posts with many tags/authors, this could be slow.
An optimization would be to collect all tag/author IDs first, then fetch them in a single query using
findAll({ where: { id: tagIds } }).♻️ Optional: Batch fetch tags and authors
// Load tags via join table const postTags = await db.PostTag.findAll({ where: {post_id: post.dbId}, order: [['sort_order', 'ASC']] }); - for (const pt of postTags) { - const tagRow = await db.Tag.findByPk(pt.get('tag_id') as number); - if (tagRow) { - post.data.tags.push(TagContext.fromRow(tagRow)); - } - } + const tagIds = postTags.map(pt => pt.get('tag_id') as number); + if (tagIds.length > 0) { + const tagRows = await db.Tag.findAll({where: {id: tagIds}}); + const tagMap = new Map(tagRows.map(r => [r.get('id'), r])); + for (const pt of postTags) { + const tagRow = tagMap.get(pt.get('tag_id') as number); + if (tagRow) { + post.data.tags.push(TagContext.fromRow(tagRow)); + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/mg-context/src/lib/PostContext.ts` around lines 324 - 348, The PostContext loading code causes N+1 queries by calling db.Tag.findByPk and db.Author.findByPk inside loops; instead, collect all tag IDs from postTags and all author IDs from postAuthors, call db.Tag.findAll and db.Author.findAll once each to fetch rows in bulk, build maps of id->row, and then iterate the original postTags/postAuthors (which are already ordered by sort_order) to push TagContext.fromRow(tagMap[id]) into post.data.tags and AuthorContext.fromRow(authorMap[id]) into post.data.authors, skipping missing rows to preserve current behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/mg-context/src/lib/database.ts`:
- Around line 27-44: Update the Tag and Author models to enforce slug uniqueness
(make the slug column or its index unique in the sequelize.define for Tag and
Author) and change TagContext.save and AuthorContext.save to perform an atomic
upsert instead of separate findOne+update: use Model.upsert(...) (or a
transactioned findOrCreate + update) and implement retry-on-unique-violation
logic that catches Sequelize.UniqueConstraintError and retries the upsert once
to handle races; ensure references to the unique constraint match the model
(Tag.slug, Author.slug) so the retry detection is correct.
- Around line 46-56: PostTag and PostAuthor join tables lack indexes on their
foreign-key columns causing full-table scans; update the model definitions for
PostTag and PostAuthor (the sequelize.define calls) to add proper indexes on
post_id and tag_id for PostTag and on post_id and author_id for PostAuthor
(either via the define "indexes" option or by creating individual indexes after
model definition) so queries resolving relations by post_id/tag_id/author_id use
indexed lookups.
In `@packages/mg-json/lib/to-ghost-json/index.js`:
- Around line 8-10: The module-level slug reset (resetSlugs()) is called before
the awaited validate(...) inside the exported async function (the toGhostJSON
entry), allowing concurrent calls to share the slug cache; move the resetSlugs()
invocation so it runs after the final await validate(...) completes (or instead
refactor the slug store to be invocation-scoped and used by the function),
ensuring resetSlugs() is executed post-validation within the exported async
function to prevent cross-call deduplication.
---
Nitpick comments:
In `@packages/mg-context/src/lib/MigrateBase.ts`:
- Line 8: The public property dbId on MigrateBase should be made a private field
per guidelines: replace the public declaration with a private class field (use
the ECMAScript private field syntax `#dbId`: number | null = null) and remove any
external/public exposure; update all internal references in this class from
this.dbId to this.#dbId and, if external access is required, add an explicit
protected getter/setter or a named method (e.g., getDbId()/setDbId()) on
MigrateBase so callers or subclasses use that API instead of the public
property; ensure the code that currently deletes dbId (the spot referenced in
the review) is updated to use the new accessor or internal mutation.
In `@packages/mg-context/src/lib/MigrateContext.ts`:
- Around line 118-148: The findPosts method repeats the same pattern for slug,
title and sourceAttr (fetch all rows via this.db.Post.findAll, JSON.parse
row.get(...), filter, and await PostContext.fromRow) — extract that logic into a
helper (e.g., a private method on MigrateContext like
filterPostsByPredicate(predicate: (data, row) => boolean):
Promise<PostContext[]>) that accepts a predicate to test parsed post data (and
parsed source when needed), then replace the slug/title/sourceAttr branches to
call this helper with simple predicates referencing data.slug/data.title or
source[sourceAttr.key] respectively; keep using PostContext.fromRow inside the
helper to build results.
In `@packages/mg-context/src/lib/PostContext.ts`:
- Around line 324-348: The PostContext loading code causes N+1 queries by
calling db.Tag.findByPk and db.Author.findByPk inside loops; instead, collect
all tag IDs from postTags and all author IDs from postAuthors, call
db.Tag.findAll and db.Author.findAll once each to fetch rows in bulk, build maps
of id->row, and then iterate the original postTags/postAuthors (which are
already ordered by sort_order) to push TagContext.fromRow(tagMap[id]) into
post.data.tags and AuthorContext.fromRow(authorMap[id]) into post.data.authors,
skipping missing rows to preserve current behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 339aeadd-acb4-4c3d-9e98-9a2ee8934cca
📒 Files selected for processing (13)
packages/mg-context/README.mdpackages/mg-context/package.jsonpackages/mg-context/src/lib/AuthorContext.tspackages/mg-context/src/lib/MigrateBase.tspackages/mg-context/src/lib/MigrateContext.tspackages/mg-context/src/lib/PostContext.tspackages/mg-context/src/lib/TagContext.tspackages/mg-context/src/lib/database.tspackages/mg-context/src/test/migrate-context.test.tspackages/mg-context/src/test/tasks.test.tspackages/mg-json/lib/to-ghost-json/index.jspackages/mg-json/lib/to-ghost-json/meta-to-ghost.jspackages/mg-json/test/to-ghost-json.test.js
| const Tag = sequelize.define('Tag', { | ||
| data: {type: DataTypes.TEXT, allowNull: false}, | ||
| slug: {type: DataTypes.STRING, allowNull: true}, | ||
| name: {type: DataTypes.STRING, allowNull: true} | ||
| }, { | ||
| timestamps: false, | ||
| indexes: [{fields: ['slug']}, {fields: ['name']}] | ||
| }); | ||
|
|
||
| const Author = sequelize.define('Author', { | ||
| data: {type: DataTypes.TEXT, allowNull: false}, | ||
| slug: {type: DataTypes.STRING, allowNull: true}, | ||
| name: {type: DataTypes.STRING, allowNull: true}, | ||
| email: {type: DataTypes.STRING, allowNull: true} | ||
| }, { | ||
| timestamps: false, | ||
| indexes: [{fields: ['slug']}, {fields: ['name']}, {fields: ['email']}] | ||
| }); |
There was a problem hiding this comment.
Enforce slug uniqueness in the cache tables.
TagContext.save() and AuthorContext.save() both treat slug as the lookup key, but these tables only add non-unique indexes. Once duplicate slugs exist, future findOne({where: {slug}}) calls become nondeterministic and can update the wrong row. Please make slug unique here and switch the save path to an atomic upsert/retry-on-unique-violation flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/mg-context/src/lib/database.ts` around lines 27 - 44, Update the Tag
and Author models to enforce slug uniqueness (make the slug column or its index
unique in the sequelize.define for Tag and Author) and change TagContext.save
and AuthorContext.save to perform an atomic upsert instead of separate
findOne+update: use Model.upsert(...) (or a transactioned findOrCreate + update)
and implement retry-on-unique-violation logic that catches
Sequelize.UniqueConstraintError and retries the upsert once to handle races;
ensure references to the unique constraint match the model (Tag.slug,
Author.slug) so the retry detection is correct.
| const PostTag = sequelize.define('PostTag', { | ||
| post_id: {type: DataTypes.INTEGER, allowNull: false}, | ||
| tag_id: {type: DataTypes.INTEGER, allowNull: false}, | ||
| sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} | ||
| }, {timestamps: false}); | ||
|
|
||
| const PostAuthor = sequelize.define('PostAuthor', { | ||
| post_id: {type: DataTypes.INTEGER, allowNull: false}, | ||
| author_id: {type: DataTypes.INTEGER, allowNull: false}, | ||
| sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} | ||
| }, {timestamps: false}); |
There was a problem hiding this comment.
Index the join-table foreign keys.
The batched readback path will repeatedly resolve relations by post_id / tag_id / author_id. Without indexes on these columns, PostTag and PostAuthor degrade to full-table scans as the cache grows.
Suggested fix
const PostTag = sequelize.define('PostTag', {
post_id: {type: DataTypes.INTEGER, allowNull: false},
tag_id: {type: DataTypes.INTEGER, allowNull: false},
sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}
- }, {timestamps: false});
+ }, {
+ timestamps: false,
+ indexes: [{fields: ['post_id']}, {fields: ['tag_id']}]
+ });
const PostAuthor = sequelize.define('PostAuthor', {
post_id: {type: DataTypes.INTEGER, allowNull: false},
author_id: {type: DataTypes.INTEGER, allowNull: false},
sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0}
- }, {timestamps: false});
+ }, {
+ timestamps: false,
+ indexes: [{fields: ['post_id']}, {fields: ['author_id']}]
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const PostTag = sequelize.define('PostTag', { | |
| post_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| tag_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} | |
| }, {timestamps: false}); | |
| const PostAuthor = sequelize.define('PostAuthor', { | |
| post_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| author_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} | |
| }, {timestamps: false}); | |
| const PostTag = sequelize.define('PostTag', { | |
| post_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| tag_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} | |
| }, { | |
| timestamps: false, | |
| indexes: [{fields: ['post_id']}, {fields: ['tag_id']}] | |
| }); | |
| const PostAuthor = sequelize.define('PostAuthor', { | |
| post_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| author_id: {type: DataTypes.INTEGER, allowNull: false}, | |
| sort_order: {type: DataTypes.INTEGER, allowNull: false, defaultValue: 0} | |
| }, { | |
| timestamps: false, | |
| indexes: [{fields: ['post_id']}, {fields: ['author_id']}] | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/mg-context/src/lib/database.ts` around lines 46 - 56, PostTag and
PostAuthor join tables lack indexes on their foreign-key columns causing
full-table scans; update the model definitions for PostTag and PostAuthor (the
sequelize.define calls) to add proper indexes on post_id and tag_id for PostTag
and on post_id and author_id for PostAuthor (either via the define "indexes"
option or by creating individual indexes after model definition) so queries
resolving relations by post_id/tag_id/author_id use indexed lookups.
| export default async (input, options = {}, ctx) => { | ||
| // Reset module-level slug deduplication state so each call is independent | ||
| resetSlugs(); |
There was a problem hiding this comment.
Move resetSlugs() below the awaited validation.
resetSlugs() runs before the await validate(...) on Line 16, so two concurrent toGhostJSON() calls can still bleed into the same module-level slug cache and cross-deduplicate each other’s output. Reset after the final await in this function, or make the slug store invocation-scoped.
Suggested fix
export default async (input, options = {}, ctx) => {
- // Reset module-level slug deduplication state so each call is independent
- resetSlugs();
-
// Construct a basic Ghost JSON template
let output = baseTemplate();
// Validate data and potentially alter if needed
input = await validate(input, ctx);
+
+ // Reset after the final await so concurrent calls do not share state
+ resetSlugs();
// Reorder tags so `#internal-tags` appear last
input = reorderTags(input);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export default async (input, options = {}, ctx) => { | |
| // Reset module-level slug deduplication state so each call is independent | |
| resetSlugs(); | |
| export default async (input, options = {}, ctx) => { | |
| // Construct a basic Ghost JSON template | |
| let output = baseTemplate(); | |
| // Validate data and potentially alter if needed | |
| input = await validate(input, ctx); | |
| // Reset after the final await so concurrent calls do not share state | |
| resetSlugs(); | |
| // Reorder tags so `#internal-tags` appear last | |
| input = reorderTags(input); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/mg-json/lib/to-ghost-json/index.js` around lines 8 - 10, The
module-level slug reset (resetSlugs()) is called before the awaited
validate(...) inside the exported async function (the toGhostJSON entry),
allowing concurrent calls to share the slug cache; move the resetSlugs()
invocation so it runs after the final await validate(...) completes (or instead
refactor the slug store to be invocation-scoped and used by the function),
ensuring resetSlugs() is executed post-validation within the exported async
function to prevent cross-call deduplication.
WIP