Skip to content

Add installation smoke tests for Windows and macOS#206

Merged
rtibbles merged 1 commit intomainfrom
claude/brainstorm-plan-approach-011CUspMnoyn48zg9EANcDFj
Nov 25, 2025
Merged

Add installation smoke tests for Windows and macOS#206
rtibbles merged 1 commit intomainfrom
claude/brainstorm-plan-approach-011CUspMnoyn48zg9EANcDFj

Conversation

@rtibbles
Copy link
Member

@rtibbles rtibbles commented Nov 7, 2025

Summary

After a lot of back and forth, I narrowed this down to just smoke tests:
Installation Smoke Tests (Shell Scripts) - Validate built installers work end-to-end
Validate built installers work end-to-end
Test actual user installation experience
Verify app launches and serves HTTP

References

There is no open issue for this - but it seemed like a useful additional check over and above the existing PR builds - with the main idea to have something fail early before we wait for manual QA!

Reviewer guidance

🤖 This was created by Claude Code. @rtibbles then reviewed the generated output, and did iterative rounds of updates before making it ready for review 🤖

@rtibbles rtibbles force-pushed the claude/brainstorm-plan-approach-011CUspMnoyn48zg9EANcDFj branch from 040ceeb to d54f4b7 Compare November 7, 2025 19:23
@rtibbles
Copy link
Member Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

Walkthrough

The PR introduces automated smoke test workflows and corresponding shell scripts for Windows and macOS platforms. These additions enable end-to-end installation verification, service management, log parsing, and API availability checks as part of the CI/CD pipeline.

Changes

Cohort / File(s) Summary
PR Build Workflow
.github/workflows/pr_build.yml
Adds two new smoke test jobs: Smoke Test Windows (uses Windows runner, references EXE artifact) and Smoke Test macOS (uses macOS runner, references DMG artifact). Both jobs include installation execution, log collection on failure, and artifact upload.
Smoke Test Scripts
scripts/smoke_test_macos.sh, scripts/smoke_test_windows.sh
Adds platform-specific smoke test scripts. macOS script: mounts DMG, launches app, verifies process, parses logs for listening port, validates API endpoint. Windows script: runs silent installer, waits for uninstaller, verifies installation structure, starts service, locates service log, extracts port, validates API endpoint, checks logs for errors. Both include timeout-based error handling and detailed failure reporting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Shell script robustness: Verify timeout logic, error handling paths, and process polling mechanisms across platform-specific commands
  • Log parsing reliability: Review port extraction regex patterns and fallback logic in both scripts
  • Path handling: Confirm cross-platform path references are correct (Windows UNC paths in scripts/smoke_test_windows.sh, macOS app bundle paths in scripts/smoke_test_macos.sh)
  • API validation: Ensure HTTP response parsing and JSON field checks (kolibri_version presence) are reliable

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add installation smoke tests for Windows and macOS' clearly and specifically describes the main change in the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The pull request description clearly describes the changeset: installation smoke tests for Windows and macOS that validate built installers work end-to-end, test user installation experience, and verify app launches and serves HTTP.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/brainstorm-plan-approach-011CUspMnoyn48zg9EANcDFj

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
scripts/smoke_test_macos.sh (1)

28-28: Avoid ls | grep pattern; use a glob or loop instead.

Line 28 uses ls | grep which is fragile with special filenames. Consider using a glob or a for-loop with a condition to make the script more robust.

-EXECUTABLE=$(ls "$MOUNT_POINT/Kolibri.app/Contents/MacOS/" | grep -E "^Kolibri-" | head -1)
+EXECUTABLE=$(shopt -s nullglob; for f in "$MOUNT_POINT/Kolibri.app/Contents/MacOS"/Kolibri-*; do basename "$f"; break; done)

Alternatively, use find:

-EXECUTABLE=$(ls "$MOUNT_POINT/Kolibri.app/Contents/MacOS/" | grep -E "^Kolibri-" | head -1)
+EXECUTABLE=$(find "$MOUNT_POINT/Kolibri.app/Contents/MacOS/" -maxdepth 1 -name "Kolibri-*" | head -1 | xargs basename)
scripts/smoke_test_windows.sh (2)

16-16: Verify Inno Setup silent install flags and timeout handling.

Line 16 uses Inno Setup silent install flags //VERYSILENT //SUPPRESSMSGBOXES //LOG. While these appear correct, the script does not capture the installer's exit code. If the installer silently fails, the script will not immediately detect it and will proceed to wait for unins000.exe (line 22), potentially causing a false timeout.

Consider capturing the installer's exit code to fail fast on install errors:

-"$INSTALLER_PATH" //VERYSILENT //SUPPRESSMSGBOXES //LOG="$LOG_FILE"
+"$INSTALLER_PATH" //VERYSILENT //SUPPRESSMSGBOXES //LOG="$LOG_FILE" || {
+    echo "ERROR: Installer exited with code $?"
+    head -100 "$LOG_FILE" 2>/dev/null || echo "Log file not found"
+    exit 1
+}

52-60: Improve Windows service status verification.

Line 52 checks service status with sc query Kolibri | grep -q "RUNNING", but this approach lacks specificity. If the service doesn't exist, sc query will fail and the loop will retry, delaying failure detection.

Add an explicit check to verify the service exists before polling its status:

-while ! sc query Kolibri | grep -q "RUNNING"; do
+WAIT_INITIAL=1
+while true; do
+    # Check if service exists
+    if ! sc query Kolibri > /dev/null 2>&1; then
+        if [ "$WAIT_INITIAL" -eq 1 ]; then
+            WAIT_INITIAL=0
+        else
+            echo "ERROR: Kolibri service does not exist"
+            exit 1
+        fi
+    elif sc query Kolibri | grep -q "STATE.*RUNNING"; then
+        break
+    fi

Alternatively, use a more explicit state check:

-while ! sc query Kolibri | grep -q "RUNNING"; do
+while [ "$(sc query Kolibri 2>/dev/null | grep -oP 'STATE\s+:\s+\K\S+')" != "RUNNING" ]; do
Makefile (1)

195-215: Define INSTALLER_PATH and DMG_PATH variables in Makefile or document their intention.

Lines 199 and 204 reference $(INSTALLER_PATH) and $(DMG_PATH) variables that are not defined in the Makefile. Currently, these will be empty unless passed as environment variables or make arguments (e.g., make smoke-test-windows INSTALLER_PATH="dist-installer/kolibri-setup-*.exe").

This means running make smoke-test-windows without arguments will pass an empty path to the script, and the script will fall back to its hardcoded defaults. While this works, it's implicit and could be confusing.

Either define these variables with defaults at the top of the Makefile:

+INSTALLER_PATH ?= dist-installer/kolibri-setup-*-unsigned.exe
+DMG_PATH ?= dist/kolibri-*.dmg
+
 .PHONY: clean get-whl install-whl clean-whl build-mac-app pyinstaller build-dmg compile-mo needs-version

Or document the intended usage in the targets:

 .PHONY: smoke-test-windows
 smoke-test-windows:
+	# Usage: make smoke-test-windows [INSTALLER_PATH="path/to/installer.exe"]
 	@echo "Running Windows installation smoke test..."

The first approach (defining defaults in Makefile) is preferred for consistency with other build tools.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f4b64b and 6a8fec5.

📒 Files selected for processing (4)
  • .github/workflows/pr_build.yml (1 hunks)
  • Makefile (1 hunks)
  • scripts/smoke_test_macos.sh (1 hunks)
  • scripts/smoke_test_windows.sh (1 hunks)
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile

[warning] 206-206: Missing required phony target "all"

(minphony)


[warning] 206-206: Missing required phony target "test"

(minphony)

🪛 Shellcheck (0.11.0)
scripts/smoke_test_macos.sh

[warning] 28-28: Don't use ls | grep. Use a glob or a for loop with a condition to allow non-alphanumeric filenames.

(SC2010)

🔇 Additional comments (2)
.github/workflows/pr_build.yml (2)

49-82: Verify that build_exe job outputs exe-file-name.

Line 60 references needs.build_exe.outputs.exe-file-name, but this output must be defined in the build_exe job (or the referenced reusable workflow ./.github/workflows/build_windows.yml). If this output is not defined, the workflow will fail at runtime.

Please verify that the build_exe job (or its reusable workflow build_windows.yml) outputs a variable named exe-file-name. You can check this by:

  1. Examining .github/workflows/build_windows.yml to confirm it contains an outputs section with exe-file-name
  2. Or running the workflow in a PR to confirm the output is available

If the output name differs, update line 60 and line 67 accordingly.


85-114: Verify that build_dmg job outputs dmg-file-name.

Line 95 references needs.build_dmg.outputs.dmg-file-name, but this output must be defined in the build_dmg job (or the referenced reusable workflow ./.github/workflows/build_mac.yml). If this output is not defined, the workflow will fail at runtime.

Please verify that the build_dmg job (or its reusable workflow build_mac.yml) outputs a variable named dmg-file-name. You can check this by:

  1. Examining .github/workflows/build_mac.yml to confirm it contains an outputs section with dmg-file-name
  2. Or running the workflow in a PR to confirm the output is available

If the output name differs, update line 95 and line 101 accordingly.

@rtibbles rtibbles changed the title Claude Code experiment: Generate comprehensive test suite for kolibri-app Add installation smoke tests for Windows and macOS Nov 12, 2025
@rtibbles rtibbles force-pushed the claude/brainstorm-plan-approach-011CUspMnoyn48zg9EANcDFj branch from 6a8fec5 to 809bd0a Compare November 12, 2025 19:01
@rtibbles rtibbles force-pushed the claude/brainstorm-plan-approach-011CUspMnoyn48zg9EANcDFj branch from 809bd0a to 572dd9b Compare November 12, 2025 19:10
@rtibbles
Copy link
Member Author

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
scripts/smoke_test_windows.sh (3)

4-4: Glob pattern in default value may not expand as intended.

When INSTALLER_PATH is used without a passed argument, the glob pattern kolibri-setup-*-unsigned.exe remains unexpanded in the quoted variable, causing the installer command to fail if the wildcard literal doesn't match an actual file. The workflow provides explicit filenames via $1, so this works in CI/CD, but local testing without arguments will break.

Consider resolving the glob explicitly:

# Option 1: Use ls to expand glob
if [ -z "$1" ]; then
  INSTALLER_PATH=$(ls dist-installer/kolibri-setup-*-unsigned.exe | head -1)
  if [ -z "$INSTALLER_PATH" ]; then
    echo "ERROR: No installer found matching pattern"
    exit 1
  fi
else
  INSTALLER_PATH="$1"
fi

68-91: Hardcoded sleep and fragile log format assumptions.

Line 70's sleep 10 is a fixed wait for logs to be created. On resource-constrained or loaded CI runners, 10 seconds may be insufficient, causing flaky failures. Additionally, port detection (lines 80–84) depends on log format stability; if Kolibri's logging changes, both fallback patterns could fail silently.

Consider implementing adaptive log-polling (wait until log exists before parsing) and document the expected log format patterns. Alternatively, add a configurable timeout environment variable:

LOG_TIMEOUT=${LOG_TIMEOUT:-30}
ELAPSED=0
while [ ! -f "$KOLIBRI_LOG" ] && [ $ELAPSED -lt $LOG_TIMEOUT ]; do
  sleep 1
  ELAPSED=$((ELAPSED + 1))
done

114-122: Inconsistent error count handling vs. macOS script.

Line 115 doesn't strip whitespace from wc -l output, unlike the macOS script (line 114: tr -d ' '). While bash's -gt operator may tolerate leading whitespace in arithmetic context, this inconsistency introduces portability risk and makes cross-platform maintenance harder.

Apply this diff for consistency and robustness:

-ERROR_COUNT=$(grep -i "error" "$KOLIBRI_LOG" | grep -v "DEBUG" | wc -l || echo "0")
+ERROR_COUNT=$(grep -i "error" "$KOLIBRI_LOG" | grep -v "DEBUG" | wc -l | tr -d ' ')
scripts/smoke_test_macos.sh (2)

5-42: Mount point collision and executable naming assumptions.

Line 16's hdiutil attach assumes /Volumes/Kolibri is available. If this mount point already exists (e.g., from a failed previous run), the command will fail without cleanup. Additionally, line 28 assumes the executable follows the pattern Kolibri-*; if the naming scheme changes, detection fails.

Improve robustness by checking for pre-existing mounts and handling versioning more flexibly:

# Unmount if already mounted
hdiutil detach "$MOUNT_POINT" 2>/dev/null || true

# Then attach
hdiutil attach "$DMG_PATH" -mountpoint "$MOUNT_POINT"

# Executable discovery: find any executable in MacOS folder, not just "Kolibri-*"
EXECUTABLE=$(find "$MOUNT_POINT/Kolibri.app/Contents/MacOS/" -maxdepth 1 -type f -perm +111 | head -1)
if [ -z "$EXECUTABLE" ]; then
  EXECUTABLE=$(find "$MOUNT_POINT/Kolibri.app/Contents/MacOS/" -maxdepth 1 -name "Kolibri*" | head -1)
fi

44-88: Hardcoded app launch delay and non-specific process matching.

Line 47's sleep 30 is a fixed wait for app initialization. On fast or resource-constrained runners, this could be unnecessarily long or too short depending on system load. Additionally, line 49's pgrep -f "Kolibri.app" uses a broad pattern that could match unrelated processes if their command line contains the string.

Consider adaptive waits and more specific process matching:

# Adaptive launch wait with timeout
LAUNCH_TIMEOUT=60
ELAPSED=0
while ! pgrep -f "Kolibri.app/Contents/MacOS/Kolibri" > /dev/null && [ $ELAPSED -lt $LAUNCH_TIMEOUT ]; do
  sleep 2
  ELAPSED=$((ELAPSED + 2))
done

if [ $ELAPSED -ge $LAUNCH_TIMEOUT ]; then
  echo "ERROR: Kolibri did not launch within ${LAUNCH_TIMEOUT}s"
  exit 1
fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cf985d and 572dd9b.

📒 Files selected for processing (3)
  • .github/workflows/pr_build.yml (1 hunks)
  • scripts/smoke_test_macos.sh (1 hunks)
  • scripts/smoke_test_windows.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build Unsigned DMG / build_dmg
  • GitHub Check: Build Unsigned EXE / Build EXE file
🔇 Additional comments (5)
scripts/smoke_test_windows.sh (1)

52-66: Service startup verification assumes hardcoded service name.

Line 56 uses sc query Kolibri | grep -q "RUNNING" which assumes the Windows service is named exactly "Kolibri". If the service name differs (e.g., "Kolibri App" or "KolibriService"), the test will fail to detect the running service and timeout.

Verify that the Windows installer consistently registers the service with the name "Kolibri", or make this configurable.

scripts/smoke_test_macos.sh (1)

1-126: Overall script quality: Well-structured with good error handling.

The macOS script demonstrates solid defensive coding practices: process cleanup with || true (lines 85, 97, 105), graceful degradation with error messages, and correct whitespace handling. The main opportunities for improvement are replacing hardcoded sleeps with adaptive polling and improving process matching specificity.

.github/workflows/pr_build.yml (3)

49-66: Verify that build_exe job exports exe-file-name output.

The workflow references needs.build_exe.outputs.exe-file-name (line 60), but the build_exe job definition is not shown in the provided context. The smoke test will fail if this output is not defined in the build job.

Verify that the build_exe job (referenced in ./.github/workflows/build_windows.yml) includes this output definition:

jobs:
  build_exe:
    outputs:
      exe-file-name: ${{ ... }}  # Should be defined

69-84: Verify that build_dmg job exports dmg-file-name output.

The workflow references needs.build_dmg.outputs.dmg-file-name (line 79), but the build_dmg job definition is not shown in the provided context. The smoke test will fail if this output is not defined in the build job.

Verify that the build_dmg job (referenced in ./.github/workflows/build_mac.yml) includes this output definition:

jobs:
  build_dmg:
    outputs:
      dmg-file-name: ${{ ... }}  # Should be defined

48-84: Workflow job structure is sound with proper dependencies.

The smoke test jobs follow GitHub Actions best practices: explicit dependencies (needs: build_exe/build_dmg), appropriate runner selection, and bash shell on Windows for cross-platform compatibility. The dependency chain ensures build artifacts are available before tests run.

However, verify the required output variables are defined in build jobs (see comments above). If error capture/artifact upload is needed for debugging failed tests, you may also want to add a step like:

- name: Upload logs on failure
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: smoke-test-logs-windows
    path: smoke_test_install.log

@rtibbles rtibbles marked this pull request as ready for review November 12, 2025 19:37
Copy link
Member

@marcellamaki marcellamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like LLM-generated github actions! seems aligned with what we want to accomplish on code read-through, although i was not able to tell just by reading if it will actually work the way it seems like it should. seems low-risk to merge and then confirm whether or not the actions work ETA, the actions work and have run in the course of the PR

@rtibbles rtibbles merged commit 70cc89d into main Nov 25, 2025
7 checks passed
@rtibbles rtibbles deleted the claude/brainstorm-plan-approach-011CUspMnoyn48zg9EANcDFj branch November 25, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants