Skip to content

Conversation

@ArneTR
Copy link
Member

@ArneTR ArneTR commented Jan 9, 2026

Screenshot 2026-01-09 at 11 58 56 AM

Greptile Overview

Greptile Overview

Greptile Summary

This PR improves error handling for Out-of-Memory (OOM) errors by detecting exit code 137 and providing helpful error messages that reference the GMT_CONTAINER_MEMORY_LIMIT environment variable.

Changes:

  • Added specific OOM error detection in three locations: container launch, setup-commands, and flow execution
  • Exposed container memory limit via GMT_CONTAINER_MEMORY_LIMIT environment variable
  • Added comprehensive test coverage for all three OOM scenarios
  • Modified test_resource_limits_good() to run full flow and verify the environment variable is accessible

Issues found:

  • Critical: Potential UnboundLocalError in _check_process_returncodes() when timeout occurs (pre-existing bug, but perpetuated by new code)
  • Style: Inconsistent formatting in test file names, comment spacing, memory limit format, and environment variable handling

@ArneTR
Copy link
Member Author

ArneTR commented Jan 9, 2026

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 9, 2026

Additional Comments (1)

lib/scenario_runner.py
Potential UnboundLocalError if TimeoutExpired exception occurs. When the exception is caught and pass executes, stderr is never assigned. Then lines 2215 and 2217 will fail with NameError when trying to reference stderr.

This is a pre-existing bug, but the new OOM error handling at line 2215 perpetuates it. Consider initializing stderr before the try block:

                stderr = None
                try:
                    if ps['detach']:
                        _, stderr = ps['ps'].communicate(timeout=1)
                    else:
                        stderr = ps['ps'].stderr
                except subprocess.TimeoutExpired:
                    stderr = "(timeout reading stderr)"

                if process_helpers.check_process_failed(ps['ps'], ps['detach']):
                    if ps['ps'].returncode == 137:
                        raise MemoryError(f"Your process {ps['cmd']} failed due to an Out-of-Memory error (Code: 137). Please check if you can instruct the process to use less memory or higher resource limits on the container. The set memory for the container is exposed in the ENV var: GMT_CONTAINER_MEMORY_LIMIT.\n\nDetached process: {ps['detach']}\n\n========== Stderr ==========\n{stderr}")
                    else:
                        raise RuntimeError(f"Process '{ps['cmd']}' had bad returncode: {ps['ps'].returncode}. Stderr: {stderr}; Detached process: {ps['detach']}. Please also check the stdout in the logs and / or enable stdout logging to debug further.")

@ArneTR ArneTR merged commit 18f9b41 into main Jan 11, 2026
1 of 2 checks passed
@ArneTR ArneTR deleted the oom-better-error branch January 11, 2026 03:35
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 11, 2026

Required keyword not found in PR title or description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants