-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker container crashes when python task passes outputs with unicode escape sequences #7221
Comments
hmm, this isn't a kestra issue, right? 😉 the file type doesn't have any expectations about the file extension — the input would happily accept a PNG file as well, it just expects any file and it's up to your task to determine how to process that user's file. We have dedicated tasks to work with Excel. I'd recommend the following — use conditional logic to use either a CSV or Excel file depending on whether the user chooses to upload a CSV or Excel file: id: flow1
namespace: tutorial
inputs:
- id: fileType
type: SELECT
values:
- CSV
- XLSX
- id: file
type: FILE
tasks:
- id: process_csv
runIf: "{{ inputs.fileType is 'CSV' }}"
type: io.kestra.plugin.serdes.csv.CsvToIon
from: "{{inputs.file}}"
fieldSeparator: ;
- id: process_excel
runIf: "{{ inputs.fileType is 'EXCEL' }}"
type: io.kestra.plugin.serdes.excel.ExcelToIon
from: "{{inputs.file}}"
- id: op001
type: io.kestra.plugin.core.debug.Return
format: "{{ tasks.process_csv.state != 'SKIPPED' ? outputs.process_csv.uri : outputs.process_excel.uris.Sheet1 }}"
- id: op01
type: io.kestra.plugin.serdes.json.IonToJson
from: "{{outputs.op001.uri}}"
- id: op1
type: io.kestra.plugin.scripts.python.Script
taskRunner:
type: io.kestra.plugin.scripts.runner.docker.Docker
beforeCommands: []
outputFiles:
- output.jsonl
containerImage: ghcr.io/kestra-io/kestrapy
script: |-
import csv, json
from kestra import Kestra
file_uri = '{{ outputs.op01.uri }}'
output_url = 'output.jsonl'
headers = None
with open(file_uri, 'rb') as f:
# Open output file
with open(output_url, 'w') as outfile:
# Iterates over input json
for line in f:
data = json.loads(line)
if headers is None:
headers = data.keys()
new_data = dict(zip(headers, map(lambda x: ' ' + str(x) + ' ' if isinstance(x, (int, float)) else x, data.values())))
outfile.write(json.dumps(new_data)+"\n")
Kestra.outputs({ 'headers': ','.join(headers), 'headers_expression': ','.join(len(headers) * ['?']) }) I'll close the issue for now as the suggested solution above is preferrable than trying to just read and parse incorrect file type — @Skraye feel free to reopen if you have some suggestions on how to handle such possible user errors leading to encoding issues due to mismatched file type |
It crashes the Worker itself which is not a good idea. |
Describe the issue
We have a problem with following flow:
We expect .csv file as an input. But we managed to upload .xlsx file instead and it has been processed by CsvToIon task and serialized to this broken .ion output
the problem happens in 3rd task that calls python script to extract headers from json output. The problem happens at line
Kestra.outputs({ 'headers': ','.join(headers), 'headers_expression': ','.join(len(headers) * ['?']) })
as it sends to Kestra outputheaders
with unicode escape sequences, which causes worker container to crash with the following error message:Environment
The text was updated successfully, but these errors were encountered: