feat: add readiness probe flapping fault #401

preespp · 2025-11-13T08:54:42Z

Added a new fault for readiness probe flapping (incident 200), targeting the frontend Deployment in the default namespace. The mechanism modifies the Deployment’s readiness probe to alternate between healthy and unhealthy, simulating intermittent readiness failures that cause pods to repeatedly transition between Ready and NotReady.

Injection:

Retrieve existing Deployment and backup its current manifest.
Patch the container’s readinessProbe with aggressive parameters:
- periodSeconds set low to trigger frequent checks.
- failureThreshold and successThreshold set to 1.
Pods may alternate between Ready and NotReady.
If restart_policy is set to force, pods are restarted to apply the patch immediately.

Alerts:

RequestErrorRate (frontend-service-1): Triggered due to increased 5xx errors from flapping pods.
RequestLatency (frontend-service-1): Frontend latency spikes due to pod churn.
FailedPodsDetected (frontend-pod-1): Expected alert did not fire—the pod never fully failed despite readiness probe flapping.

preespp added 2 commits November 10, 2025 23:34

Untest, will test and modify soon

0092927

Update values to fire alert (still missing FailedPodDetected)

777a9ef

preespp requested review from Red-GV and rohanarora as code owners November 13, 2025 08:54

Fix Register Name

5cef6b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add readiness probe flapping fault #401

feat: add readiness probe flapping fault #401

Uh oh!

preespp commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add readiness probe flapping fault #401

Are you sure you want to change the base?

feat: add readiness probe flapping fault #401

Uh oh!

Conversation

preespp commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant