Poisoned pod, extremely slow, weird thread counts

Our pods (hosted in kubernetes)  sometimes get poisoned and 100% of their requests become incredibly slow (think 5-25secs instead of <1s). 
Response is slow even with zero traffic (essentially only test curl).
Non poisioned pod from same deployment works just fine (and has normal numbers in jmx etc).

interesting jmx of poisioned pod
```
ComputePoolSampler.ActiveThreadCount is 65534 
ComputePoolSampler.WorkerThreadCount is 2
ComputePoolSampler.SearchingThreadCount is 65535. 
ComputePoolSampler.BlockedWorkerThreadCount is 0
```
Weirdly threaddump shows < 100threads. [threads_dump.txt](https://github.com/user-attachments/files/21330624/threads_dump.txt)

```
scala2.13.6
cats-effect: 3.6.2
doobie 1.0.0-RC9
sttp client with armeria backend 3.11.0
tapir + http4s 1.11.36

```

```
vm 17.0.15 eclipse adoptium (observed also on 21)

jvm options
 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.rmi.port=1099 -Djava.rmi.server.hostname=127.0.0.1 -XX:+UseG1GC -XX:G1PeriodicGCInterval=30000 -XX:+PrintCommandLineFlags -Dsun.net.inetaddr.ttl=30 -Dpidfile.path=/dev/null -XX:+CrashOnOutOfMemoryError -XX:MaxMetaspaceSize=320m -XX:ReservedCodeCacheSize=64m -XX:CompressedClassSpaceSize=64m -XX:InitialHeapSize=519m -XX:MaxHeapSize=1038m
-XX:CompressedClassSpaceSize=67108864 -XX:ConcGCThreads=1 -XX:+CrashOnOutOfMemoryError -XX:G1ConcRefinementThreads=2 -XX:G1PeriodicGCInterval=30000 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=544210944 -XX:+ManagementServer -XX:MarkStackSize=4194304 -XX:MaxHeapSize=1088421888 -XX:MaxMetaspaceSize=335544320 -XX:MinHeapSize=6815736 -XX:+PrintCommandLineFlags -XX:ReservedCodeCacheSize=67108864 -XX:-THPStackMitigation -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
```


If we kill the pod the new one works just fine, but roughly once a day we see this happen on eventually on multiple pods across multiple services, where only commonality is cats-effect 3.6.2, latest scala2, tapir and sttp client, doobie rc9 (though not necessarily the same client/server backends).

k8 instances
```
 limits:
            cpu: 1100m
            memory: 1200Mi
          requests:
            cpu: 600m
            memory: 1200Mi
```



Some pods never get poisoned, we haven't yet observed any pattern except that it happens semi-regularly, and once it's poisioned it doesn't recover.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poisoned pod, extremely slow, weird thread counts #4448

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poisoned pod, extremely slow, weird thread counts #4448

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions