Skip to content

Add e2e test that covers the whole migration lifecycle#1604

Open
PaulAdamDavis wants to merge 1 commit intomainfrom
dev-e2e
Open

Add e2e test that covers the whole migration lifecycle#1604
PaulAdamDavis wants to merge 1 commit intomainfrom
dev-e2e

Conversation

@PaulAdamDavis
Copy link
Member

@PaulAdamDavis PaulAdamDavis commented Mar 4, 2026

Note

Low Risk
Adds a new end-to-end test and fixtures only; production migration logic is unchanged. Main risk is CI flakiness due to reliance on zip/unzip, filesystem temp dirs, and network mocking.

Overview
Adds a new Substack E2E test (substack-e2e.test.js) that builds a minimal Substack export ZIP, runs the full substack task runner with scraping and zipping enabled, unzips the output, and asserts the generated ghost-import.json and bundled assets are correct.

The test covers web-scraped SEO/social metadata, author extraction from ld+json, tag extraction from _analyticsConfig/_preloads plus platform/type/access tags, HTML-to-Lexical conversion (including subscribe button URL rewriting), and verifies image downloads are included in the output zip via nock-mocked requests and a new HTML scrape fixture (substack-e2e-scrape.html).

Written by Cursor Bugbot for commit 1e2c67c. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Walkthrough

This pull request adds test fixtures and a comprehensive end-to-end test suite for Substack migration functionality. A static HTML fixture file simulates a Substack article page with metadata tags, JSON-LD structured data, and preloaded post information. A corresponding test file validates the complete migration workflow by creating temporary test data, mocking network requests, executing the migration task, and asserting the output Ghost import structure contains correctly transformed posts, metadata, authors, tags, lexical content, and downloaded assets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add e2e test that covers the whole migration lifecycle' directly and clearly describes the main change: a comprehensive end-to-end test for the complete Substack migration process.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the new E2E test, its fixtures, scope (scraping, authors, tags, assets), and explicitly noting that production migration logic is unchanged.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev-e2e

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Free Tier Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Unquoted shell path in execSync zip command
    • Wrapped zipPath in double quotes in the execSync zip command so paths containing spaces are passed correctly to the shell.

Create PR

Or push these changes by commenting:

@cursor push ddb8e8829a
Preview (ddb8e8829a)
diff --git a/packages/migrate/test/substack-e2e.test.js b/packages/migrate/test/substack-e2e.test.js
--- a/packages/migrate/test/substack-e2e.test.js
+++ b/packages/migrate/test/substack-e2e.test.js
@@ -63,7 +63,7 @@
 
         // Create the input ZIP
         zipPath = join(fixtureDir, '..', 'substack-e2e.zip');
-        execSync(`zip -r ${zipPath} posts.csv posts/`, {
+        execSync(`zip -r "${zipPath}" posts.csv posts/`, {
             cwd: fixtureDir
         });
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

zipPath = join(fixtureDir, '..', 'substack-e2e.zip');
execSync(`zip -r ${zipPath} posts.csv posts/`, {
cwd: fixtureDir
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unquoted shell path in execSync zip command

Low Severity

The zipPath variable is interpolated into the execSync shell command without quotes, while the unzip command on line 144 properly quotes its path arguments. If the resolved test path contains spaces, the zip command will break because the shell will split it into multiple arguments.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/migrate/test/substack-e2e.test.js`:
- Around line 152-159: The teardown in the after hook calls rmSync on
fixtureDir, zipPath, and outputDir without guards, which can throw if setup
failed and those variables are undefined; update the after() cleanup to check
each variable exists (and optionally fs.existsSync(path) is true) before calling
rmSync for fixtureDir, zipPath, and outputDir (or wrap each rmSync in a safe
check/try-catch) so teardown doesn't mask setup failures.
- Line 5: The tests import and use execSync with shell-interpolated zip/unzip
commands (via the execSync symbol) which is unsafe; replace those usages with
execFileSync from 'node:child_process' and call execFileSync with the executable
name ('zip' or 'unzip') and an array of arguments (e.g., ['-r', dest, source] or
['-o', archive, file]) to avoid shell interpolation and path-escaping issues;
update the import to include execFileSync, find all occurrences of execSync that
run zip/unzip in this test file (the zip/unzip command invocations), and convert
them to execFileSync(executable, args, {stdio: 'inherit'}) or appropriate
options while preserving exit behavior and captured output.
- Line 4: Replace any use of a file URL's .pathname when converting to
filesystem paths with node's fileURLToPath to ensure Windows and percent-encoded
characters are handled correctly: import {fileURLToPath} from 'node:url'
alongside the existing join import, then change occurrences where you currently
read fileUrl.pathname (or similar variables like fileUrl) and pass that into
join to instead call fileURLToPath(fileUrl) and use the result; update all
instances in this file (including the uses around the join calls and the block
covering the other occurrences) so path construction uses fileURLToPath(fileUrl)
+ join as appropriate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d0975565-d8fc-40b6-97bd-36c00e84c6f7

📥 Commits

Reviewing files that changed from the base of the PR and between 2503fae and 1e2c67c.

📒 Files selected for processing (2)
  • packages/migrate/test/fixtures/substack-e2e-scrape.html
  • packages/migrate/test/substack-e2e.test.js

import assert from 'node:assert/strict';
import {readFile} from 'node:fs/promises';
import {existsSync, mkdirSync, writeFileSync, readFileSync, rmSync} from 'node:fs';
import {join} from 'node:path';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the pattern: new URL(...).pathname with import.meta.url
rg -nP "new URL\([^)]*import\.meta\.url\)\.pathname" --type=js -C 2

Repository: TryGhost/migrate

Length of output: 8601


🏁 Script executed:

#!/bin/bash
# Also search for variations without escaping to catch the pattern
rg "new URL.*import.meta.url.*pathname" --type=js -C 2

Repository: TryGhost/migrate

Length of output: 8361


🏁 Script executed:

#!/bin/bash
# Check the specific file mentioned in the review
head -60 packages/migrate/test/substack-e2e.test.js | cat -n

Repository: TryGhost/migrate

Length of output: 2981


🌐 Web query:

fileURLToPath Node.js official documentation filesystem path

💡 Result:

fileURLToPath is a Node.js API in the node:url module that converts a file: URL (as a URL object or string) into a fully-resolved, platform-specific absolute filesystem path, correctly decoding percent-encoded characters. [1]

import { fileURLToPath } from 'node:url';

const __filename = fileURLToPath(import.meta.url);

Notes:

  • Signature: url.fileURLToPath(url[, options]) (with options.windows to force Windows vs POSIX parsing). [1]
  • If you’re ultimately calling fs APIs, most fs operations accept paths as a string, Buffer, or a file: URL (URL object). [2]

Sources: Node.js URL docs (url.fileURLToPath) [1], Node.js File system docs (file paths / File URL paths) [2].


Use fileURLToPath() for filesystem-safe file URL conversion across platforms.

The .pathname property of file URLs can produce incorrect paths on Windows (forward slashes, missing drive letter handling) and with percent-encoded characters. Use fileURLToPath() from node:url instead.

🔧 Proposed fix
import {join} from 'node:path';
+import {fileURLToPath} from 'node:url';
@@
-        fixtureDir = join(
-            new URL('.', import.meta.url).pathname,
-            'fixtures',
-            'substack-e2e-tmp'
-        );
+        fixtureDir = fileURLToPath(new URL('./fixtures/substack-e2e-tmp', import.meta.url));

Also applies to: 52-56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/migrate/test/substack-e2e.test.js` at line 4, Replace any use of a
file URL's .pathname when converting to filesystem paths with node's
fileURLToPath to ensure Windows and percent-encoded characters are handled
correctly: import {fileURLToPath} from 'node:url' alongside the existing join
import, then change occurrences where you currently read fileUrl.pathname (or
similar variables like fileUrl) and pass that into join to instead call
fileURLToPath(fileUrl) and use the result; update all instances in this file
(including the uses around the join calls and the block covering the other
occurrences) so path construction uses fileURLToPath(fileUrl) + join as
appropriate.

import {readFile} from 'node:fs/promises';
import {existsSync, mkdirSync, writeFileSync, readFileSync, rmSync} from 'node:fs';
import {join} from 'node:path';
import {execSync} from 'node:child_process';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and its line count
wc -l packages/migrate/test/substack-e2e.test.js

Repository: TryGhost/migrate

Length of output: 106


🏁 Script executed:

# Read the imports at the top of the file, focusing on line 5
head -20 packages/migrate/test/substack-e2e.test.js | cat -n

Repository: TryGhost/migrate

Length of output: 1307


🏁 Script executed:

# Check the specific lines mentioned: 5, 66-67, 144-145
sed -n '1,10p; 60,75p; 140,150p' packages/migrate/test/substack-e2e.test.js | cat -n

Repository: TryGhost/migrate

Length of output: 1973


🏁 Script executed:

# Get lines 60-75 to find the zip command
sed -n '60,75p' packages/migrate/test/substack-e2e.test.js | cat -n

Repository: TryGhost/migrate

Length of output: 837


🏁 Script executed:

# Get lines 135-150 to find the unzip command
sed -n '135,150p' packages/migrate/test/substack-e2e.test.js | cat -n

Repository: TryGhost/migrate

Length of output: 939


Replace shell-interpolated archive commands with argument-based execFileSync to prevent path escaping and injection vulnerabilities.

Lines 5, 66–67, and 144–145 use execSync with shell-interpolated zip/unzip commands, which are vulnerable to path escaping issues and command-string parsing fragility. Use execFileSync with argument arrays instead.

🔧 Proposed fix
-import {execSync} from 'node:child_process';
+import {execFileSync} from 'node:child_process';
@@
-        execSync(`zip -r ${zipPath} posts.csv posts/`, {
+        execFileSync('zip', ['-r', zipPath, 'posts.csv', 'posts/'], {
             cwd: fixtureDir
         });
@@
-        execSync(`unzip -o "${ctx.outputFile.path}" -d "${unzipDir}"`);
+        execFileSync('unzip', ['-o', ctx.outputFile.path, '-d', unzipDir]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/migrate/test/substack-e2e.test.js` at line 5, The tests import and
use execSync with shell-interpolated zip/unzip commands (via the execSync
symbol) which is unsafe; replace those usages with execFileSync from
'node:child_process' and call execFileSync with the executable name ('zip' or
'unzip') and an array of arguments (e.g., ['-r', dest, source] or ['-o',
archive, file]) to avoid shell interpolation and path-escaping issues; update
the import to include execFileSync, find all occurrences of execSync that run
zip/unzip in this test file (the zip/unzip command invocations), and convert
them to execFileSync(executable, args, {stdio: 'inherit'}) or appropriate
options while preserving exit behavior and captured output.

Comment on lines +152 to +159
after(function () {
nock.cleanAll();
nock.enableNetConnect();

// Clean up all temp files
rmSync(fixtureDir, {recursive: true, force: true});
rmSync(zipPath, {force: true});
rmSync(outputDir, {recursive: true, force: true});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard cleanup paths so teardown does not mask setup failures.

If setup exits early, rmSync can receive undefined paths and throw, hiding the original failure. Add existence checks before cleanup calls.

🔧 Proposed fix
     after(function () {
         nock.cleanAll();
         nock.enableNetConnect();

         // Clean up all temp files
-        rmSync(fixtureDir, {recursive: true, force: true});
-        rmSync(zipPath, {force: true});
-        rmSync(outputDir, {recursive: true, force: true});
+        if (fixtureDir) {
+            rmSync(fixtureDir, {recursive: true, force: true});
+        }
+        if (zipPath) {
+            rmSync(zipPath, {force: true});
+        }
+        if (outputDir) {
+            rmSync(outputDir, {recursive: true, force: true});
+        }
     });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
after(function () {
nock.cleanAll();
nock.enableNetConnect();
// Clean up all temp files
rmSync(fixtureDir, {recursive: true, force: true});
rmSync(zipPath, {force: true});
rmSync(outputDir, {recursive: true, force: true});
after(function () {
nock.cleanAll();
nock.enableNetConnect();
// Clean up all temp files
if (fixtureDir) {
rmSync(fixtureDir, {recursive: true, force: true});
}
if (zipPath) {
rmSync(zipPath, {force: true});
}
if (outputDir) {
rmSync(outputDir, {recursive: true, force: true});
}
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/migrate/test/substack-e2e.test.js` around lines 152 - 159, The
teardown in the after hook calls rmSync on fixtureDir, zipPath, and outputDir
without guards, which can throw if setup failed and those variables are
undefined; update the after() cleanup to check each variable exists (and
optionally fs.existsSync(path) is true) before calling rmSync for fixtureDir,
zipPath, and outputDir (or wrap each rmSync in a safe check/try-catch) so
teardown doesn't mask setup failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant