Skip to content

Conversation

salotz
Copy link

@salotz salotz commented May 14, 2025

Why are the changes needed?

Motivating use case

I was attempting to use the Python SDK in Python code to run tasks/workflows. When dealing with a case like this:

@dataclass
class FlyteBlobComplex:

    blob: FlyteFile
    param: float

@task()
def smoke_blob_complex_inputs(val: FlyteBlobComplex) -> bool:

    val.blob.download()
    check_file(Path(val.blob.path))
    
    return True

I ran into problems with supplying the proper inputs to FlyteRemote.execute for the FlyteFile attributes.

After some troubleshooting I found a set of bugs surrounding how types and unions are handled in the DataclassTransformer class to be the culprit.

At a high level I should be able to supply inputs like this:

flyte_remote.execute(
    smoke_blob_complex_inputs,
    inputs={
        "val" : {
            "blob" : {"path" : "gs://mybucket/path/to/obj"},
            "param" : 1.0,
        },
    },
)

However this yielded this error:

flytekit.core.type_engine.TypeTransformerFailedError: The original fields are missing the following keys from the dataclass fields: ['metadata']

What changes were proposed in this pull request?

In this PR I fixed two issues related to handling of union types in the DataclassTransformer class that were manifesting in the behavior above.

First, when types of dataclass fields were being retrieved with dataclasses.fields method (on the FlyteFile object, which I suppose is also a dataclass but probably a special case (?)) the f.type was somehow being cast to a string such that you would get 'Optional[dict[str,str]] for the metadata field. When this was checked for if it was an optional type this would fail as the checker doesn't understand strings, only real type objects. (NOTE that this is a type mismatch that could probably be figured out statically).

So replacing dataclasses.fields with typing.get_type_hints properly resolves the types as type objects solves this issue.

Second, further down in the code when dealing with non-Optional Unions values are only checked if they are optional types and if type(value) == expected_type where expected type might be Union[str, Pathlike] like in the case of the path attribute of the FlyteFile.

To address this I added some code to check if the value is any of the types in the union.

These fixes solve my immediate issue but I still get the feeling that there is some bigger picture issue that I'm not seeing leading to these problems. I would appreciate some perspective from the team.

How was this patch tested?

See the use case above.

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

This pull request enhances the DataclassTransformer class by improving the handling of union types. It replaces dataclasses.fields with typing.get_type_hints for better type resolution and refines type checking logic, addressing input errors in Flyte tasks with dataclass inputs.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 2 - The changes are straightforward and primarily focus on type handling, making the review process relatively simple.

Copy link

welcome bot commented May 14, 2025

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

@machichima
Copy link
Member

Hi @salotz ,
Would you mind adding DCO and fix CI error?
Thanks!

@salotz
Copy link
Author

salotz commented May 16, 2025

@machichima what is DCO?

@machichima
Copy link
Member

@machichima what is DCO?

DCO is like signing up your commit. You can have a look here to see how to add it: https://github.com/src-d/guide/blob/master/developer-community/fix-DCO.md

Moreover, to fix the lint CI, you can do make fmt

@salotz salotz force-pushed the fix-dataclass-inputs branch 3 times, most recently from 904962f to 2b3edf9 Compare May 16, 2025 17:30
The `dataclasses.fields` function returns type annotations as strings,
which then is not handled properly during optional detection.

By using the `typing.get_type_hints` function the actual `typing`
objects are fetched. This function then correctly detects optional
types.

For instance when writing out a `FlyteFile` struct the `metadata`
field should be optional, however it was not being detected as such.

Signed-off-by: Samuel Lotz <[email protected]>
@salotz salotz force-pushed the fix-dataclass-inputs branch from 2b3edf9 to de5a137 Compare May 16, 2025 17:32
In the dataclasses conversion code type that is a member of a union
was not properly checked for if it was a member and so there would
always be an error.

For instance `FlyteFile.path` is `Union[str,Pathlike]` and so `str !=
Union[str,Pathlike]`.

This patch adds support for checking that a type is part of a union
and a satisfactory type.

Signed-off-by: Samuel Lotz <[email protected]>
@salotz salotz force-pushed the fix-dataclass-inputs branch from de5a137 to 3d543ed Compare May 16, 2025 17:43
@salotz
Copy link
Author

salotz commented May 16, 2025

@machichima I've got DCO sorted out.

I don't see any CI running from my end until approval, but I did the formatting.


@staticmethod
def in_union(t: Type[Any], union: types.UnionType) -> bool:
return t in typing.get_args(union)
Copy link
Member

@machichima machichima May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use get_args here instead of typing.get_args? Which is already imported

return _is_union_type(t)

@staticmethod
def in_union(t: Type[Any], union: types.UnionType) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use typing.Union here? So that we do not need to import types

super().__init__("Typed Union", typing.Union)

@staticmethod
def is_union(t: Type[Any] | types.UnionType) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use typing.Union here? So that we do not need to import types

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are maintaining strict type checking, unfortunately no. As you need to check for UnionType and Union separately and UnionType doesn't fall under Type[Any] (union types are weird in Python...).

I noticed a couple other problems that were due to type mismatches so I'll defer to your preferences here as I don't want to overhaul things.

@machichima
Copy link
Member

machichima commented May 24, 2025

Could you please also merge the master branch? It contains a fix for the currently failing CI test: build-with-pandas

Also, I saw that there's a lint failed. Please run make lint locally and fix the linter error.

Thanks!

):
pass

elif original_type != expected_type:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something like this instead to make code structure better?

is_in_union = (
    UnionTransformer.is_union(expected_type)
    and UnionTransformer.in_union(original_type, expected_type)
)

if not is_in_union and original_type != expected_type:
    raise TypeTransformerFailedError(
        f"Type of Val '{original_type}' is not an instance of {expected_type}"
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants