-
Notifications
You must be signed in to change notification settings - Fork 115
Description
What's the intended vision for a Pydantic I/O script that needs to output a directory artifact (e.g. to be tarred)?
Hybrid
I ended up using an odd hybrid, here's a simplified example:
_DOWNLOAD_PATH = "/tmp/repo"
@script(outputs=Artifact(name="repo", path=_DOWNLOAD_PATH.as_posix(), archive=TarArchiveStrategy())
def download_repository(input: DownloadRepositoryInput) -> DownloadRepositoryOutput:
extract_repo(input.repo, _DOWNLOAD_PATH) # Download to the right place to end up in the artifact
details = extract_repo_details(_DOWNLOAD_PATH)
return DownloadRepositoryOutput(details=details)
Output in input
I believe there is supposed to be a way to mark an artifact as an output in the input class to get handed a Path, but I couldn't get that to work.
class DownloadRepositoryInput:
...
repo: Annotated[Path, Artifact(name="repo", archive=TarArchiveStrategy(), output=True)]
I tried this, but I can't create the workflow template any more:
File "workflow_template.py", line 90, in get_workflow_template
repo_artifact = download_output.get_artifact("repo")
File "hera/workflows/_mixins.py", line 1001, in get_artifact
return self._get_artifact(name=name, subtype=self._subtype)
File "hera/workflows/_mixins.py", line 974, in _get_artifact
raise ValueError(f"Cannot get output artifacts when the template has no outputs: {template}")
ValueError: Cannot get output artifacts when the template has no outputs:
It's also surprising to make an output in an input object, and I'm not sure how this would work with decorator syntax, where there would not be any way to access the output artifact.
Path in output
Ideally I'd write something like this:
class DownloadRepositoryOutput(Output):
repo: Annotated[Path, Artifact(name="repo", archive=TarArchiveStrategy())]
...
@script()
def download_repository(input: DownloadRepositoryInput) -> DownloadRepositoryOutput:
repo_path = Path("./repo")
extract_repo(input.repo, repo_path)
details = extract_repo_details(repo_path)
return DownloadRepositoryOutput(repo=repo_path, details=details)
However, the path needs to be in the yaml, and unsurprisingly it's not the right path:
outputs:
artifacts:
- name: repo
path: /tmp/hera-outputs/artifacts/repo
archive:
tar: {}
Also, the code fails at runtime:
File "hera/workflows/_runner/util.py", line 259, in _runner
output = _save_annotated_return_outputs(function(**function_kwargs), output_annotations)
File "hera/workflows/_runner/script_annotations_util.py", line 234, in _save_annotated_return_outputs
_write_to_path(path, value, _get_dumper_function(matching_output))
File "hera/workflows/_runner/script_annotations_util.py", line 303, in _write_to_path
dumped_output = dumper(output_value)
...
File "hera/shared/serialization.py", line 47, in serialize
return json.dumps(value, cls=PydanticEncoder) # None serialized as `null`
...
TypeError: Object of type PosixPath is not JSON serializable
Would it be reasonable to get this solution working? It seems like the right approach to support decorator syntax. We'd need to ensure Argo finds the right directory -- a symlink might work? If not, we can do a recursive copy. (A move sounds sensible, except a user might expect to be able to provide the same Path multiple times, or even nested Paths.)