Skip to content

Conversation

@gavin-dunlap-luminar
Copy link
Contributor

@gavin-dunlap-luminar gavin-dunlap-luminar commented Dec 15, 2025

This PR addresses two race conditions that caused intermittent build failures when using Ceedling's parallel test builds.

Bug 1: Preprocessor Cache Race Condition

Problem: When multiple threads simultaneously processed the same source file for the first time, they would race to read/write the YAML cache file containing #include statement listings. This caused cache corruption and intermittent build failures.

Root Cause:

The preprocess_includes() method in preprocessinator.rb performed check-then-act operations (check if cache exists -> read OR extract -> write) without synchronization. Multiple threads could interleave these operations:

Thread A checks cache doesn't exist
Thread B checks cache doesn't exist
Thread A extracts includes and writes cache
Thread B extracts includes and writes cache (corrupting A's write)

Fix:

Added per-file mutex locking around the entire cache operation sequence. Each cache file gets its own mutex (stored in a thread-safe hash with mutex protection), ensuring only one thread can check/read/write a specific cache file at a time while still allowing parallel processing of different files.

Bug 2: FileList Lazy Evaluation Race Condition

Problem: Intermittent linker errors with "undefined reference" messages occurred when source files were missing from the build. Tests would fail sporadically (roughly 1 per 1000 runs for my workload.)

Root Cause:

Ruby's Rake::FileList uses lazy evaluation - glob patterns aren't expanded until the list is accessed. In configurator_builder.rb and file_path_collection_utils.rb, FileLists were created from glob patterns but not materialized before being shared across threads. When multiple threads accessed the same lazy FileList simultaneously during the BUILD phase, they would trigger concurrent pattern expansion, causing list corruption and resulting in missing source files during linking.

Fix:

Added .resolve() call in file_path_collection_utils.rb within the revise_filelist() method to force FileList expansion before returning the result
Added .resolve() calls in configurator_builder.rb to materialize FileLists immediately after creation, before any threading occurs
This ensures all file collections are fully materialized during the configuration phase before being used in the threaded build phases.

:project_use_test_preprocessor_tests => false,
:project_use_test_preprocessor_mocks => false
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: I ran a simple formatter to clean whitespaces since I had some stray whitespace in my original work, and it deleted some pre-existing whitespace.

@mkarlesky
Copy link
Member

mkarlesky commented Dec 15, 2025

@gavin-dunlap-luminar Thank you for this.

You're absolutely correct about the preprocessor cacheing problem. This is a temporary addition of some simple cacheing while delta builds have been removed (a necessary consequence of slowly removing a reliance on the problematic Rake library — Rake is Ruby make). Delta builds only rebuilt those things that changed. We need to recreate this in 1.0.0+. That small bit of YAML cacheing was indeed incomplete given the new reality of parallel builds. Totally missed that.

You have independently discovered one of the reasons we are removing Rake as a core library from Ceedling. It provided a great deal of infrastructure and features. But, as time has worn on it's become an anchor around the project's neck. The handy-dandy FileList that Rake provides is flawed. One of its problems is unreliable resolution of file globs. We had already resorted to processing globs and file paths ourselves to circumvent its incomplete handling of paths and also forcing it to resolve paths before using a given FileList. It would be best to entirely replace it, but we've not been able to make that sizable change yet. Thank your for finding the places where the direct invocation of .resolve() were yet needed.

@mkarlesky mkarlesky added the bug label Dec 20, 2025
@mkarlesky mkarlesky merged commit 8464143 into ThrowTheSwitch:master Dec 20, 2025
8 of 11 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in 1.0.n BugFix Releases Dec 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants