Skip to content

Failed to call ‘uploadInputFiles’ if the pod is in different node than kestra worker in EKS #148

Open
@surya9teja

Description

@surya9teja

Expected Behavior

I have a SQS trigger and when a new message flows into the queue, it will convert into .jsonl and pass the file uri as inputFiles to kubernetes.PodCreate. The file will be accessed inside the pod and processed.

Actual Behaviour

When I pass the nodeSelectors and tolerations to the kubernetes pod which will be deployed into different node (Not same as the kestra-worker deployed). Because of the kestra and task pod is in different node. busy-box image is failed to upload the file that I am trying to pass it via flow.

But When I removed the node selectors and toleration, the inputFile upload works fine as it intended. From my observation it is only failed if kestra and newly creating task pod not in the same node. By the way I use Karpenter to scale the EKS nodes up and down dynamically (Just passing the info if it is anything related to it).

Steps To Reproduce

the error log for task creating pod and failing

2024-09-17T19:19:27.693Z INFO Pod 'microformboa-dev-boa-etl-part-1-async-textract-job-qjyzk' is created 
2024-09-17T19:19:27.708Z DEBUG Received action 'ADDED' on [Type: Pod, Namespace: dev, Name: microformboa-dev-boa-etl-part-1-async-textract-job-qjyzk, Uid: c19ebc8e-e92d-4791-a60e-f3dfc61f18e8, Phase: Pending]
2024-09-17T19:20:06.043Z DEBUG Received action 'MODIFIED' on [Type: Pod, Namespace: dev, Name: microformboa-dev-boa-etl-part-1-async-textract-job-qjyzk, Uid: c19ebc8e-e92d-4791-a60e-f3dfc61f18e8, Phase: Pending]
2024-09-17T19:20:06.094Z DEBUG Received action 'MODIFIED' on [Type: Pod, Namespace: dev, Name: microformboa-dev-boa-etl-part-1-async-textract-job-qjyzk, Uid: c19ebc8e-e92d-4791-a60e-f3dfc61f18e8, Phase: Pending]
2024-09-17T19:20:09.811Z DEBUG Received action 'MODIFIED' on [Type: Pod, Namespace: dev, Name: microformboa-dev-boa-etl-part-1-async-textract-job-qjyzk, Uid: c19ebc8e-e92d-4791-a60e-f3dfc61f18e8, Phase: Pending]
2024-09-17T19:20:10.128Z DEBUG Failed to call 'uploadInputFiles'
2024-09-17T19:20:11.684Z DEBUG Failed to call 'uploadMarker'
2024-09-17T19:20:13.034Z DEBUG Failed to call 'uploadMarker'
2024-09-17T19:20:15.318Z DEBUG Failed to call 'uploadMarker'
2024-09-17T19:20:15.331Z DEBUG Received close on [Type: PodWatcher]
2024-09-17T19:20:15.345Z INFO Pod 'microformboa-dev-boa-etl-part-1-async-textract-job-qjyzk' is deleted 
2024-09-17T19:20:15.350Z TRACE io.kestra.core.utils.RetryUtils$RetryFailed: Stop retry, attempts 3 elapsed after 3 seconds
	at io.kestra.core.utils.RetryUtils$Instance.lambda$exceptionFallback$4(RetryUtils.java:153)
	at dev.failsafe.internal.FallbackImpl.apply(FallbackImpl.java:58)
	at dev.failsafe.internal.FallbackExecutor.lambda$apply$0(FallbackExecutor.java:62)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
	at io.kestra.core.utils.RetryUtils$Instance.wrap(RetryUtils.java:144)
	at io.kestra.core.utils.RetryUtils$Instance.run(RetryUtils.java:129)
	at io.kestra.plugin.kubernetes.services.PodService.withRetries(PodService.java:147)
	at io.kestra.plugin.kubernetes.services.PodService.uploadMarker(PodService.java:173)
	at io.kestra.plugin.kubernetes.AbstractPod.uploadInputFiles(AbstractPod.java:84)
	at io.kestra.plugin.kubernetes.PodCreate.run(PodCreate.java:193)
	at io.kestra.plugin.kubernetes.PodCreate.run(PodCreate.java:39)
	at io.kestra.core.runners.WorkerTaskThread.doRun(WorkerTaskThread.java:76)
	at io.kestra.core.runners.AbstractWorkerThread.run(AbstractWorkerThread.java:57)

2024-09-17T19:20:15.350Z ERROR Stop retry, attempts 3 elapsed after 3 seconds

Environment Information

  • Kestra Version: 0.187
  • Plugin version: latest
  • Operating System (OS / Docker / Kubernetes): Kubernetes (EKS)
  • Java Version (If not docker):

Example flow

id: dev-boa-etl-part-1
namespace: microform.boa

triggers:
  - id: trigger
    type: io.kestra.plugin.aws.sqs.Trigger
    accessKeyId: "{{kv(key='aws_access_key', errorOnMissing=true)}}"
    secretKeyId: "{{kv(key='aws_secret_key', errorOnMissing=true)}}"
    region: "eu-west-2"
    serdeType: STRING
    queueUrl: "queueurl"
    maxRecords: 10
    maxDuration: PT10S
  

tasks:
  - id: to_json
    type: io.kestra.plugin.serdes.json.IonToJson
    from: "{{ trigger.uri }}"
  
  - id: log
    type: io.kestra.plugin.core.log.Log
    message: "{{read(outputs.to_json.uri)}}"
      
  - id: async_textract_job
    type: io.kestra.plugin.kubernetes.PodCreate
    namespace: dev
    inputFiles: 
      data.jsonl: "{{outputs.to_json.uri}}"
    metadata:
      labels:
        company: microform.boa
        task: boa-etl-pipeline-part-1
    waitRunning: PT1H
    waitUntilRunning: PT15M
    spec:
      containers:
        - name: boa-etl-pipeline-part-1
          image: 1tw678125321685.dkr.ecr.eu-west-2.amazonaws.com/boaml/etl:v1.0
          command:
            - python 
            - textract.py
            - '--f'
            - "{{workingDir}}/data.jsonl"
      volumeMounts:
        - name: environment-vars
          mountPath: /app/.env
          subPath: .env
          readOnly: true
      imagePullPolicy: IfNotPresent
      nodeSelector:
        resource-type: private-cpu
      tolerations:
        - key: private/cpu
          operator: Exists
          effect: NoSchedule
      restartPolicy: OnFailure
      volumes:
        - name: environment-vars
          secret:
            secretName: boa-etl-env-vars

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/backendNeeds backend code changesarea/pluginPlugin-related issue or feature requestbugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions