Add e2e test that covers the whole migration lifecycle#1604
Add e2e test that covers the whole migration lifecycle#1604PaulAdamDavis wants to merge 1 commit intomainfrom
Conversation
WalkthroughThis pull request adds test fixtures and a comprehensive end-to-end test suite for Substack migration functionality. A static HTML fixture file simulates a Substack article page with metadata tags, JSON-LD structured data, and preloaded post information. A corresponding test file validates the complete migration workflow by creating temporary test data, mocking network requests, executing the migration task, and asserting the output Ghost import structure contains correctly transformed posts, metadata, authors, tags, lexical content, and downloaded assets. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Free Tier Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Unquoted shell path in
execSynczip command- Wrapped
zipPathin double quotes in theexecSynczip command so paths containing spaces are passed correctly to the shell.
- Wrapped
Or push these changes by commenting:
@cursor push ddb8e8829a
Preview (ddb8e8829a)
diff --git a/packages/migrate/test/substack-e2e.test.js b/packages/migrate/test/substack-e2e.test.js
--- a/packages/migrate/test/substack-e2e.test.js
+++ b/packages/migrate/test/substack-e2e.test.js
@@ -63,7 +63,7 @@
// Create the input ZIP
zipPath = join(fixtureDir, '..', 'substack-e2e.zip');
- execSync(`zip -r ${zipPath} posts.csv posts/`, {
+ execSync(`zip -r "${zipPath}" posts.csv posts/`, {
cwd: fixtureDir
});| zipPath = join(fixtureDir, '..', 'substack-e2e.zip'); | ||
| execSync(`zip -r ${zipPath} posts.csv posts/`, { | ||
| cwd: fixtureDir | ||
| }); |
There was a problem hiding this comment.
Unquoted shell path in execSync zip command
Low Severity
The zipPath variable is interpolated into the execSync shell command without quotes, while the unzip command on line 144 properly quotes its path arguments. If the resolved test path contains spaces, the zip command will break because the shell will split it into multiple arguments.
Additional Locations (1)
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/migrate/test/substack-e2e.test.js`:
- Around line 152-159: The teardown in the after hook calls rmSync on
fixtureDir, zipPath, and outputDir without guards, which can throw if setup
failed and those variables are undefined; update the after() cleanup to check
each variable exists (and optionally fs.existsSync(path) is true) before calling
rmSync for fixtureDir, zipPath, and outputDir (or wrap each rmSync in a safe
check/try-catch) so teardown doesn't mask setup failures.
- Line 5: The tests import and use execSync with shell-interpolated zip/unzip
commands (via the execSync symbol) which is unsafe; replace those usages with
execFileSync from 'node:child_process' and call execFileSync with the executable
name ('zip' or 'unzip') and an array of arguments (e.g., ['-r', dest, source] or
['-o', archive, file]) to avoid shell interpolation and path-escaping issues;
update the import to include execFileSync, find all occurrences of execSync that
run zip/unzip in this test file (the zip/unzip command invocations), and convert
them to execFileSync(executable, args, {stdio: 'inherit'}) or appropriate
options while preserving exit behavior and captured output.
- Line 4: Replace any use of a file URL's .pathname when converting to
filesystem paths with node's fileURLToPath to ensure Windows and percent-encoded
characters are handled correctly: import {fileURLToPath} from 'node:url'
alongside the existing join import, then change occurrences where you currently
read fileUrl.pathname (or similar variables like fileUrl) and pass that into
join to instead call fileURLToPath(fileUrl) and use the result; update all
instances in this file (including the uses around the join calls and the block
covering the other occurrences) so path construction uses fileURLToPath(fileUrl)
+ join as appropriate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0975565-d8fc-40b6-97bd-36c00e84c6f7
📒 Files selected for processing (2)
packages/migrate/test/fixtures/substack-e2e-scrape.htmlpackages/migrate/test/substack-e2e.test.js
| import assert from 'node:assert/strict'; | ||
| import {readFile} from 'node:fs/promises'; | ||
| import {existsSync, mkdirSync, writeFileSync, readFileSync, rmSync} from 'node:fs'; | ||
| import {join} from 'node:path'; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for the pattern: new URL(...).pathname with import.meta.url
rg -nP "new URL\([^)]*import\.meta\.url\)\.pathname" --type=js -C 2Repository: TryGhost/migrate
Length of output: 8601
🏁 Script executed:
#!/bin/bash
# Also search for variations without escaping to catch the pattern
rg "new URL.*import.meta.url.*pathname" --type=js -C 2Repository: TryGhost/migrate
Length of output: 8361
🏁 Script executed:
#!/bin/bash
# Check the specific file mentioned in the review
head -60 packages/migrate/test/substack-e2e.test.js | cat -nRepository: TryGhost/migrate
Length of output: 2981
🌐 Web query:
fileURLToPath Node.js official documentation filesystem path
💡 Result:
fileURLToPath is a Node.js API in the node:url module that converts a file: URL (as a URL object or string) into a fully-resolved, platform-specific absolute filesystem path, correctly decoding percent-encoded characters. [1]
import { fileURLToPath } from 'node:url';
const __filename = fileURLToPath(import.meta.url);Notes:
- Signature:
url.fileURLToPath(url[, options])(withoptions.windowsto force Windows vs POSIX parsing). [1] - If you’re ultimately calling
fsAPIs, mostfsoperations accept paths as a string, Buffer, or afile:URL (URLobject). [2]
Sources: Node.js URL docs (url.fileURLToPath) [1], Node.js File system docs (file paths / File URL paths) [2].
Use fileURLToPath() for filesystem-safe file URL conversion across platforms.
The .pathname property of file URLs can produce incorrect paths on Windows (forward slashes, missing drive letter handling) and with percent-encoded characters. Use fileURLToPath() from node:url instead.
🔧 Proposed fix
import {join} from 'node:path';
+import {fileURLToPath} from 'node:url';
@@
- fixtureDir = join(
- new URL('.', import.meta.url).pathname,
- 'fixtures',
- 'substack-e2e-tmp'
- );
+ fixtureDir = fileURLToPath(new URL('./fixtures/substack-e2e-tmp', import.meta.url));Also applies to: 52-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/migrate/test/substack-e2e.test.js` at line 4, Replace any use of a
file URL's .pathname when converting to filesystem paths with node's
fileURLToPath to ensure Windows and percent-encoded characters are handled
correctly: import {fileURLToPath} from 'node:url' alongside the existing join
import, then change occurrences where you currently read fileUrl.pathname (or
similar variables like fileUrl) and pass that into join to instead call
fileURLToPath(fileUrl) and use the result; update all instances in this file
(including the uses around the join calls and the block covering the other
occurrences) so path construction uses fileURLToPath(fileUrl) + join as
appropriate.
| import {readFile} from 'node:fs/promises'; | ||
| import {existsSync, mkdirSync, writeFileSync, readFileSync, rmSync} from 'node:fs'; | ||
| import {join} from 'node:path'; | ||
| import {execSync} from 'node:child_process'; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and its line count
wc -l packages/migrate/test/substack-e2e.test.jsRepository: TryGhost/migrate
Length of output: 106
🏁 Script executed:
# Read the imports at the top of the file, focusing on line 5
head -20 packages/migrate/test/substack-e2e.test.js | cat -nRepository: TryGhost/migrate
Length of output: 1307
🏁 Script executed:
# Check the specific lines mentioned: 5, 66-67, 144-145
sed -n '1,10p; 60,75p; 140,150p' packages/migrate/test/substack-e2e.test.js | cat -nRepository: TryGhost/migrate
Length of output: 1973
🏁 Script executed:
# Get lines 60-75 to find the zip command
sed -n '60,75p' packages/migrate/test/substack-e2e.test.js | cat -nRepository: TryGhost/migrate
Length of output: 837
🏁 Script executed:
# Get lines 135-150 to find the unzip command
sed -n '135,150p' packages/migrate/test/substack-e2e.test.js | cat -nRepository: TryGhost/migrate
Length of output: 939
Replace shell-interpolated archive commands with argument-based execFileSync to prevent path escaping and injection vulnerabilities.
Lines 5, 66–67, and 144–145 use execSync with shell-interpolated zip/unzip commands, which are vulnerable to path escaping issues and command-string parsing fragility. Use execFileSync with argument arrays instead.
🔧 Proposed fix
-import {execSync} from 'node:child_process';
+import {execFileSync} from 'node:child_process';
@@
- execSync(`zip -r ${zipPath} posts.csv posts/`, {
+ execFileSync('zip', ['-r', zipPath, 'posts.csv', 'posts/'], {
cwd: fixtureDir
});
@@
- execSync(`unzip -o "${ctx.outputFile.path}" -d "${unzipDir}"`);
+ execFileSync('unzip', ['-o', ctx.outputFile.path, '-d', unzipDir]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/migrate/test/substack-e2e.test.js` at line 5, The tests import and
use execSync with shell-interpolated zip/unzip commands (via the execSync
symbol) which is unsafe; replace those usages with execFileSync from
'node:child_process' and call execFileSync with the executable name ('zip' or
'unzip') and an array of arguments (e.g., ['-r', dest, source] or ['-o',
archive, file]) to avoid shell interpolation and path-escaping issues; update
the import to include execFileSync, find all occurrences of execSync that run
zip/unzip in this test file (the zip/unzip command invocations), and convert
them to execFileSync(executable, args, {stdio: 'inherit'}) or appropriate
options while preserving exit behavior and captured output.
| after(function () { | ||
| nock.cleanAll(); | ||
| nock.enableNetConnect(); | ||
|
|
||
| // Clean up all temp files | ||
| rmSync(fixtureDir, {recursive: true, force: true}); | ||
| rmSync(zipPath, {force: true}); | ||
| rmSync(outputDir, {recursive: true, force: true}); |
There was a problem hiding this comment.
Guard cleanup paths so teardown does not mask setup failures.
If setup exits early, rmSync can receive undefined paths and throw, hiding the original failure. Add existence checks before cleanup calls.
🔧 Proposed fix
after(function () {
nock.cleanAll();
nock.enableNetConnect();
// Clean up all temp files
- rmSync(fixtureDir, {recursive: true, force: true});
- rmSync(zipPath, {force: true});
- rmSync(outputDir, {recursive: true, force: true});
+ if (fixtureDir) {
+ rmSync(fixtureDir, {recursive: true, force: true});
+ }
+ if (zipPath) {
+ rmSync(zipPath, {force: true});
+ }
+ if (outputDir) {
+ rmSync(outputDir, {recursive: true, force: true});
+ }
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| after(function () { | |
| nock.cleanAll(); | |
| nock.enableNetConnect(); | |
| // Clean up all temp files | |
| rmSync(fixtureDir, {recursive: true, force: true}); | |
| rmSync(zipPath, {force: true}); | |
| rmSync(outputDir, {recursive: true, force: true}); | |
| after(function () { | |
| nock.cleanAll(); | |
| nock.enableNetConnect(); | |
| // Clean up all temp files | |
| if (fixtureDir) { | |
| rmSync(fixtureDir, {recursive: true, force: true}); | |
| } | |
| if (zipPath) { | |
| rmSync(zipPath, {force: true}); | |
| } | |
| if (outputDir) { | |
| rmSync(outputDir, {recursive: true, force: true}); | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/migrate/test/substack-e2e.test.js` around lines 152 - 159, The
teardown in the after hook calls rmSync on fixtureDir, zipPath, and outputDir
without guards, which can throw if setup failed and those variables are
undefined; update the after() cleanup to check each variable exists (and
optionally fs.existsSync(path) is true) before calling rmSync for fixtureDir,
zipPath, and outputDir (or wrap each rmSync in a safe check/try-catch) so
teardown doesn't mask setup failures.



Note
Low Risk
Adds a new end-to-end test and fixtures only; production migration logic is unchanged. Main risk is CI flakiness due to reliance on
zip/unzip, filesystem temp dirs, and network mocking.Overview
Adds a new Substack E2E test (
substack-e2e.test.js) that builds a minimal Substack export ZIP, runs the fullsubstacktask runner with scraping and zipping enabled, unzips the output, and asserts the generatedghost-import.jsonand bundled assets are correct.The test covers web-scraped SEO/social metadata, author extraction from
ld+json, tag extraction from_analyticsConfig/_preloadsplus platform/type/access tags, HTML-to-Lexical conversion (including subscribe button URL rewriting), and verifies image downloads are included in the output zip vianock-mocked requests and a new HTML scrape fixture (substack-e2e-scrape.html).Written by Cursor Bugbot for commit 1e2c67c. This will update automatically on new commits. Configure here.