test: use process groups for robust child cleanup #21261

loplex · 2025-12-16T03:39:51Z

This PR implements the architectural improvement suggested in the previous bug report regarding test cleanup stability.

Currently, the test cleanup logic only targets direct child processes (ppid). This is fragile as it fails to terminate grandchildren or detached processes, potentially leaving zombie processes in the CI environment.

As suggested earlier:

"Architectural suggestion: Killing by ppid is fragile as it does not cover grandchildren processes. It would be more robust to kill the entire process group (pgid) to ensure a clean test environment shutdown."

Implementation Details

This change switches the cleanup strategy to target the entire Process Group.

To ensure safety and prevent the test runner from killing itself:

SIGTERM: The runner temporarily traps/ignores SIGTERM before sending it to the group (Process.kill('TERM', -pgid)).
SIGKILL: When identifying remaining processes via pgrep for a force-kill, the runner explicitly filters out its own PID from the list.

This ensures a completely clean state even if tests spawn complex process trees.

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same change?
Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your changes? Here's an example.
Have you successfully run brew lgtm (style, typechecking and tests) with your changes locally?

Currently, the test cleanup logic only targets direct child processes (`ppid`). This is fragile as it fails to terminate grandchildren or detached processes, potentially leaving zombie processes in the CI environment. This change switches the strategy to target the entire process group (`pgid`). Safety measures implemented: - The runner temporarily ignores `SIGTERM` before sending it to the group to avoid terminating itself. - When force-killing remaining processes (`SIGKILL`), the runner explicitly filters out its own PID from the list returned by `pgrep`. Signed-off-by: Martin Lopatář <[email protected]>

Copilot

Pull request overview

This PR replaces the ppid-based child process cleanup logic with a pgid-based (process group) approach to ensure more comprehensive test cleanup, including grandchildren and detached processes.

Key Changes:

Switches from pkill -P pid to Process.kill('TERM', -pgid) for terminating child processes
Adds signal trapping to protect the test runner from self-termination during SIGTERM
Implements a two-phase cleanup: SIGTERM followed by SIGKILL after a 1-second grace period
Replaces external pkill calls with direct Process.kill for the final SIGKILL phase

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-16T03:44:04Z

Library/Homebrew/test.rb

+    pgid = Process.getpgrp
+
+    $stderr.puts 'Terminating child processes...'
+    trap_saved = Signal.trap('TERM', 'IGNORE')
+    Process.kill('TERM', -pgid) rescue nil
+    Signal.trap('TERM', trap_saved)


Critical issue with process group termination logic. When calling Process.kill('TERM', -pgid), this will send SIGTERM to ALL processes in the process group, including the test runner itself. The SIGTERM trap on line 72 will not fully protect the runner because the signal is sent to the entire process group simultaneously. This could cause the test runner to terminate unexpectedly.

Additionally, Process.getpgrp returns the process group ID of the calling process, which means if this script was launched as part of an existing process group, it will attempt to kill all processes in that group, not just child processes spawned by the test.

Consider using Process.setsid at the start of the script to create a new process session/group, ensuring child processes are in their own group that can be safely terminated without affecting the parent test runner.

Copilot · 2025-12-16T03:44:04Z

Library/Homebrew/test.rb

+    pgrep = '/usr/bin/pgrep'
+    if File.executable?(pgrep)


The path is hardcoded to /usr/bin/pgrep, but the startup code at line 36 uses the which function to check for pgrep availability. This creates an inconsistency. Consider using which('pgrep') to find the actual location of pgrep or using the same path resolution strategy used elsewhere in the codebase.

Suggested change

pgrep = '/usr/bin/pgrep'

if File.executable?(pgrep)

pgrep = which("pgrep")

if pgrep && File.executable?(pgrep)

Copilot · 2025-12-16T03:44:05Z

Library/Homebrew/test.rb

+  begin
+    pgid = Process.getpgrp
+
+    $stderr.puts 'Terminating child processes...'
+    trap_saved = Signal.trap('TERM', 'IGNORE')
+    Process.kill('TERM', -pgid) rescue nil
+    Signal.trap('TERM', trap_saved)
+
    sleep 1
-    system pgrep, "-9", "-P", pid
+
+    pgrep = '/usr/bin/pgrep'
+    if File.executable?(pgrep)
+      $stderr.puts 'Killing child processes...'
+      pids = IO.popen([pgrep, '-g', pgid.to_s]) { |io| io.read.split.map(&:to_i) } rescue []
+      pids.each do |pid|
+        next if pid == Process.pid
+        Process.kill('KILL', pid) rescue nil
+      end
+    end


The cleanup code runs unconditionally in the ensure block even when tests succeed without spawning child processes. This adds unnecessary overhead (signal trapping, process group query, 1 second sleep) to all test runs. Consider checking if any child processes actually exist before running the cleanup logic, or at minimum, make the sleep conditional on whether processes were found.

Copilot · 2025-12-16T03:44:05Z

Library/Homebrew/test.rb

+  begin
+    pgid = Process.getpgrp
+
+    $stderr.puts 'Terminating child processes...'
+    trap_saved = Signal.trap('TERM', 'IGNORE')
+    Process.kill('TERM', -pgid) rescue nil
+    Signal.trap('TERM', trap_saved)
+
    sleep 1
-    system pgrep, "-9", "-P", pid
+
+    pgrep = '/usr/bin/pgrep'
+    if File.executable?(pgrep)
+      $stderr.puts 'Killing child processes...'
+      pids = IO.popen([pgrep, '-g', pgid.to_s]) { |io| io.read.split.map(&:to_i) } rescue []
+      pids.each do |pid|
+        next if pid == Process.pid
+        Process.kill('KILL', pid) rescue nil
+      end
+    end
+  rescue Exception => cleanup_ex
+    $stderr.puts "Cleanup failed: #{cleanup_ex.message}"
  end


The new process group cleanup logic lacks test coverage. Given the critical nature of this change (switching from ppid-based to pgid-based process termination) and the complexity involved (signal handling, process group management), tests should be added to verify:

Cleanup properly terminates child processes

The test runner itself is not terminated

Cleanup handles cases where no child processes exist

Error handling in the cleanup rescue block works correctly

The repository has comprehensive test coverage for other files (e.g., test_runner_formula_spec.rb), suggesting tests should be added for this functionality as well.

MikeMcQuaid

Can you provide some reproducible test cases for the case this fixes? I'm not sure it's worth all the extra code here as-is.

loplex · 2025-12-16T17:58:06Z

Can you provide some reproducible test cases for the case this fixes? I'm not sure it's worth all the extra code here as-is.

Ok, I will try. I understand, it's really a lot of extra code. Is there a test for the the current (by parent pid) cleanup logic? I didn't investigating it much, but I think it can currently catch only rear cases. Because if descendant process is completely detached, it's not a child and if it is not completely detached, it would be killed as soon as test.rb ends anyway. The only exception is if process is not executed using shell, running in background and have no connection to parent process, ...
e.g. spawn("nohup", "sleep", "42", :in => File::NULL, :out => File::NULL, :err => File::NULL)

MikeMcQuaid · 2025-12-16T19:22:48Z

Ok, I will try.

Sorry, to clarify, on this I more mean "formulae/tests/commands I can run locally to verify" rather than "please write a test for this in Ruby".

Copilot AI review requested due to automatic review settings December 16, 2025 03:39

Copilot started reviewing on behalf of loplex December 16, 2025 03:40 View session

Copilot AI reviewed Dec 16, 2025

View reviewed changes

MikeMcQuaid reviewed Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

test: use process groups for robust child cleanup #21261

test: use process groups for robust child cleanup #21261

loplex commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

Copilot AI Dec 16, 2025

Uh oh!

MikeMcQuaid left a comment

Uh oh!

loplex commented Dec 16, 2025

Uh oh!

MikeMcQuaid commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

test: use process groups for robust child cleanup #21261

Are you sure you want to change the base?

test: use process groups for robust child cleanup #21261

Conversation

loplex commented Dec 16, 2025

Implementation Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

MikeMcQuaid left a comment

Choose a reason for hiding this comment

Uh oh!

loplex commented Dec 16, 2025

Uh oh!

MikeMcQuaid commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants