Skip to content

ProcessPoolExecutor in apply endpoint causes tesseract run to hang indefinitely #356

@nmheim

Description

@nmheim

Description

See title.

Steps to reproduce

When building a tesseract with the following setup

# tesseract_config.yaml
name: "reproducer"
version: "0.0.1"
# tesseract_api.py

from concurrent.futures import ProcessPoolExecutor
from pydantic import BaseModel


class InputSchema(BaseModel):
    pass


class OutputSchema(BaseModel):
    pass


def preprocess_fn(data_id: int):
    return data_id


def apply(inputs: InputSchema):

    data_ids = list(range(10))

    pool = ProcessPoolExecutor()
    futures = []

    for idx in data_ids:
        x = pool.submit(preprocess_fn, idx)
        futures.append(x)
        print(idx, "submitted")

    for f in futures:
        res = f.result()
        print(res, "done")

    return OutputSchema()

Running via the command below never finishes

tesseract run reproducer apply '{"inputs":{}}' --output-path outputs

Logs

OS

Mac

Tesseract version

1.0.0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions