-
Notifications
You must be signed in to change notification settings - Fork 333
Fixes to dataclass transformer union handling #3248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you for opening this pull request! 🙌 These tips will help get your PR across the finish line:
|
Hi @salotz , |
@machichima what is DCO? |
DCO is like signing up your commit. You can have a look here to see how to add it: https://github.com/src-d/guide/blob/master/developer-community/fix-DCO.md Moreover, to fix the lint CI, you can do |
904962f
to
2b3edf9
Compare
The `dataclasses.fields` function returns type annotations as strings, which then is not handled properly during optional detection. By using the `typing.get_type_hints` function the actual `typing` objects are fetched. This function then correctly detects optional types. For instance when writing out a `FlyteFile` struct the `metadata` field should be optional, however it was not being detected as such. Signed-off-by: Samuel Lotz <[email protected]>
2b3edf9
to
de5a137
Compare
In the dataclasses conversion code type that is a member of a union was not properly checked for if it was a member and so there would always be an error. For instance `FlyteFile.path` is `Union[str,Pathlike]` and so `str != Union[str,Pathlike]`. This patch adds support for checking that a type is part of a union and a satisfactory type. Signed-off-by: Samuel Lotz <[email protected]>
de5a137
to
3d543ed
Compare
@machichima I've got DCO sorted out. I don't see any CI running from my end until approval, but I did the formatting. |
|
||
@staticmethod | ||
def in_union(t: Type[Any], union: types.UnionType) -> bool: | ||
return t in typing.get_args(union) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use get_args
here instead of typing.get_args
? Which is already imported
return _is_union_type(t) | ||
|
||
@staticmethod | ||
def in_union(t: Type[Any], union: types.UnionType) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use typing.Union
here? So that we do not need to import types
super().__init__("Typed Union", typing.Union) | ||
|
||
@staticmethod | ||
def is_union(t: Type[Any] | types.UnionType) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use typing.Union
here? So that we do not need to import types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are maintaining strict type checking, unfortunately no. As you need to check for UnionType
and Union
separately and UnionType
doesn't fall under Type[Any]
(union types are weird in Python...).
I noticed a couple other problems that were due to type mismatches so I'll defer to your preferences here as I don't want to overhaul things.
Could you please also merge the Also, I saw that there's a lint failed. Please run Thanks! |
): | ||
pass | ||
|
||
elif original_type != expected_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something like this instead to make code structure better?
is_in_union = (
UnionTransformer.is_union(expected_type)
and UnionTransformer.in_union(original_type, expected_type)
)
if not is_in_union and original_type != expected_type:
raise TypeTransformerFailedError(
f"Type of Val '{original_type}' is not an instance of {expected_type}"
)
Why are the changes needed?
Motivating use case
I was attempting to use the Python SDK in Python code to run tasks/workflows. When dealing with a case like this:
I ran into problems with supplying the proper inputs to
FlyteRemote.execute
for theFlyteFile
attributes.After some troubleshooting I found a set of bugs surrounding how types and unions are handled in the
DataclassTransformer
class to be the culprit.At a high level I should be able to supply inputs like this:
However this yielded this error:
What changes were proposed in this pull request?
In this PR I fixed two issues related to handling of union types in the
DataclassTransformer
class that were manifesting in the behavior above.First, when types of dataclass fields were being retrieved with
dataclasses.fields
method (on theFlyteFile
object, which I suppose is also a dataclass but probably a special case (?)) thef.type
was somehow being cast to a string such that you would get'Optional[dict[str,str]]
for themetadata
field. When this was checked for if it was an optional type this would fail as the checker doesn't understand strings, only real type objects. (NOTE that this is a type mismatch that could probably be figured out statically).So replacing
dataclasses.fields
withtyping.get_type_hints
properly resolves the types as type objects solves this issue.Second, further down in the code when dealing with non-Optional Unions values are only checked if they are optional types and if
type(value) == expected_type
where expected type might beUnion[str, Pathlike]
like in the case of thepath
attribute of theFlyteFile
.To address this I added some code to check if the value is any of the types in the union.
These fixes solve my immediate issue but I still get the feeling that there is some bigger picture issue that I'm not seeing leading to these problems. I would appreciate some perspective from the team.
How was this patch tested?
See the use case above.
Check all the applicable boxes
Summary by Bito
This pull request enhances the DataclassTransformer class by improving the handling of union types. It replaces dataclasses.fields with typing.get_type_hints for better type resolution and refines type checking logic, addressing input errors in Flyte tasks with dataclass inputs.Unit tests added: False
Estimated effort to review (1-5, lower is better): 2 - The changes are straightforward and primarily focus on type handling, making the review process relatively simple.