Skip to content

Conversation

@plural-copilot
Copy link

Summary

This PR fixes the issue causing high rates of HTTP 500 internal server errors on the /ping endpoint of the flow-test workload in the demo-prod EKS cluster. The root cause was a deliberate exception-raising condition in the app/main.py file that threw an error every 3 seconds, triggering alerts and impacting service reliability.

Changes Made

  • Commented out the error-raising lines in the /ping route handler in app/main.py to prevent the deliberate exceptions.

Rationale

Removing the deliberate exception resolves the internal server errors, stops the continuous alert firing in Prometheus/Grafana, and ensures the /ping endpoint behaves as a healthy liveness probe returning a 200 status. The fix is minimal and aligns with the GitOps deployment model, maintaining service stability and observability with the Prometheus instrumentation.

Please review and merge to restore stable, error-free operation on the /ping endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant