Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic orchestrator healthcheck #415

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dobrac
Copy link
Contributor

@dobrac dobrac commented Mar 11, 2025

Adds:

  • HTTP Endpoint "/health" reporting
{
    "status": "healthy",
    "version": "6fe34854"
}
  • Orchestrator count metric (async updown counter) "orchestrator.env.sandbox.running"
  • OTEL InstanceID

Copy link

linear bot commented Mar 11, 2025

@dobrac dobrac self-assigned this Mar 11, 2025
@dobrac dobrac added improvement Improvement for current functionality feature New feature and removed improvement Improvement for current functionality labels Mar 11, 2025
@dobrac dobrac force-pushed the add-component-healthchecks-e2b-1615 branch 2 times, most recently from ea61b96 to 471928c Compare March 12, 2025 09:14
@dobrac dobrac marked this pull request as ready for review March 12, 2025 09:31
Copy link
Member

@jakubno jakubno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly that orchestrator its own health endpoint to determine its health?

@dobrac dobrac force-pushed the add-component-healthchecks-e2b-1615 branch from 428f099 to 0455264 Compare March 18, 2025 09:24
@dobrac dobrac requested a review from jakubno March 18, 2025 13:13
Comment on lines +18 to +19
const healthcheckFrequency = 5 * time.Second
const healthcheckTimeout = 30 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make sense (timeout > frequency)?


// report updates the health status.
// This function is run in a goroutine every healthcheckFrequency for the reason of having
// longer running tasks that might me too slow or resource intensive to be run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// longer running tasks that might me too slow or resource intensive to be run
// longer running tasks that might be too slow or resource intensive to be run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants