Consider using TTL more extensively in k8s executor

## New feature

Currently `ttlSecondsAfterFinished` is an optional config option for k8s executor which adds metadata to jobs/pods which tells the k8s control plane to kill resources upon completion / failure after that many seconds.

However, the nextflow monitor pool loop will actively delete resources some amount of time post completion, this appears to be variable from my observation and if you have lots of fast running processes this can add up to lots of resources for the loop to monitor slowing the main process down. Also, if the control plane kills a job before nextflow tries to this will lead to an error like this:

```
Oct-06 18:32:43.421 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Unexpected error in tasks monitor pool loop
org.codehaus.groovy.runtime.InvokerInvocationException: nextflow.k8s.client.K8sResponseException: Request DELETE /apis/batch/v1/namespaces/<NAMESPACE>/jobs
/nf-62e16f1931488513c9732bfd1e7d5718-f9506 returned an error code=404

  {
      "kind": "Status",
      "apiVersion": "v1",
      "metadata": {
          
      },
      "status": "Failure",
      "message": "jobs.batch \"nf-62e16f1931488513c9732bfd1e7d5718-f9506\" not found",
      "reason": "NotFound",
      "details": {
          "name": "nf-62e16f1931488513c9732bfd1e7d5718-f9506",
          "group": "batch",
          "kind": "jobs"
      },
      "code": 404
  }

        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:348)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
        at groovy.lang.MetaClassImpl.doInvokeMethod(MetaClassImpl.java:1333)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1088)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
        at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:645)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:628)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:82)
        at nextflow.processor.TaskPollingMonitor$_start_closure2.doCall(TaskPollingMonitor.groovy:323)
        at nextflow.processor.TaskPollingMonitor$_start_closure2.call(TaskPollingMonitor.groovy)
        at groovy.lang.Closure.run(Closure.java:505)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: nextflow.k8s.client.K8sResponseException: Request DELETE /apis/batch/v1/namespaces/<NAMESPACE>/jobs/nf-62e16f1931488513c9732bfd1e7d5718-f9506 re
turned an error code=404
```

In the log which puts the pipeline into a state where it will no longer spawn new jobs / pods but also not finish with an error.

I suggest that you consider refactoring some of the k8s functionality to make better use of k8s features (such as TTL), better utilise the control plane to cheaply achieve many of the same features, or at the very least better handle the case where a resource no longer exists when nextflow tries to delete it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider using TTL more extensively in k8s executor #6452

New feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider using TTL more extensively in k8s executor #6452

Description

New feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions