-
Notifications
You must be signed in to change notification settings - Fork 333
[BUG] Fix StructuredDataset empty-str file_format
in dc attr access
#3027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fix StructuredDataset empty-str file_format
in dc attr access
#3027
Conversation
Signed-off-by: JiaWei Jiang <[email protected]>
Code Review Agent Run Status
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3027 +/- ##
===========================================
+ Coverage 51.81% 76.84% +25.02%
===========================================
Files 202 202
Lines 21469 21472 +3
Branches 2766 2767 +1
===========================================
+ Hits 11125 16500 +5375
+ Misses 9735 4211 -5524
- Partials 609 761 +152 ☔ View full report in Codecov by Sentry. |
Code Review Agent Run Status
|
Signed-off-by: JiaWei Jiang <[email protected]>
Code Review Agent Run Status
|
Signed-off-by: JiaWei Jiang <[email protected]>
file_format
in dc attr accessfile_format
in dc attr access
Code Review Agent Run #2ae7a3Actionable Suggestions - 0Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
Signed-off-by: JiaWei Jiang <[email protected]>
Code Review Agent Run #1863ffActionable Suggestions - 0Additional Suggestions - 10
Review Details
|
Signed-off-by: JiangJiaWei1103 <[email protected]>
Code Review Agent Run #2259beActionable Suggestions - 0Additional Suggestions - 10
Review Details
|
Signed-off-by: JiangJiaWei1103 <[email protected]>
…1103/flytekit into fix-sd-empty-str-file-format
Code Review Agent Run #d43f9bActionable Suggestions - 0Review Details
|
final testing. from flytekit import task, workflow, ImageSpec
from flytekit.types.structured.structured_dataset import StructuredDataset
import pandas as pd
flytekit_hash = "f1a2ba3a1836983ffb9bb45276d8aa9b01665600"
flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"
custom_image = ImageSpec(packages=[flytekit, "pandas", "pyarrow"],
apt_packages=["git"],
registry="localhost:30000",
env={"FLYTE_SDK_LOGGING_LEVEL": 10},
)
@task(container_image=custom_image)
def create_pd_sd() -> StructuredDataset:
return StructuredDataset(
dataframe=pd.DataFrame(
{
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
),
file_format="parquet"
)
@task(container_image=custom_image)
def return_pd_sd(sd: StructuredDataset) -> StructuredDataset:
return sd
@workflow
def wf() -> StructuredDataset:
sd = create_pd_sd()
sd = return_pd_sd(sd)
return sd
if __name__ == "__main__":
print(wf()) this work! |
Code Review Agent Run #adec49Actionable Suggestions - 0Additional Suggestions - 10
Review Details
|
Nice bro, thanks for your time! Let's move on to the next challenge! |
…flyteorg#3027) * fix: Retain user-specified file format info Signed-off-by: JiaWei Jiang <[email protected]> * fix: Set sdt format based on user-specified file_format Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant modification Signed-off-by: JiaWei Jiang <[email protected]> * test: Test file_format attribute alignment in dc.sd Signed-off-by: JiaWei Jiang <[email protected]> * Merge master and support pqt file upload Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant condition to always copy file_format over Signed-off-by: JiangJiaWei1103 <[email protected]> * Prioritize file_format in type hint over the user-specified one Signed-off-by: JiangJiaWei1103 <[email protected]> --------- Signed-off-by: JiaWei Jiang <[email protected]> Signed-off-by: JiangJiaWei1103 <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Umer Ahmad <[email protected]>
…flyteorg#3027) * fix: Retain user-specified file format info Signed-off-by: JiaWei Jiang <[email protected]> * fix: Set sdt format based on user-specified file_format Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant modification Signed-off-by: JiaWei Jiang <[email protected]> * test: Test file_format attribute alignment in dc.sd Signed-off-by: JiaWei Jiang <[email protected]> * Merge master and support pqt file upload Signed-off-by: JiaWei Jiang <[email protected]> * Remove redundant condition to always copy file_format over Signed-off-by: JiangJiaWei1103 <[email protected]> * Prioritize file_format in type hint over the user-specified one Signed-off-by: JiangJiaWei1103 <[email protected]> --------- Signed-off-by: JiaWei Jiang <[email protected]> Signed-off-by: JiangJiaWei1103 <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Atharva <[email protected]>
Tracking issue
Closes flyteorg/flyte#6096.
Why are the changes needed?
When users create a
StructuredDataset
with a specifiedfile_format
(e.g.,parquet
), thefile_format
information will be accidentally discarded in this case duringasync_to_literal
call. To be concrete,StructuredDatasetType
'sformat
attribute is set toGENERIC_FORMAT
, which is an empty string""
.What changes were proposed in this pull request?
Override
StructuredDatasetType
'sformat
attribute when users explicitly setfile_format
of python nativeStructuredDataset
.How was this patch tested?
This patch is tested through the newly added integration test and double checked by observing the flyte console I/O and the task pod stdout.
Setup process
For local run, the setup process is summarized as follows:
After installation, run the following command:
Screenshots
The following results are expected:
Flyte console input

Flyte console output

Task pod stdout

Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This PR implements comprehensive Flytekit improvements including: enhanced execution engine with improved error handling, fixed StructuredDataset file format handling, K8s Data Service plugin implementation, and Ray plugin enhancements. Key features include configurable cloud storage writes, improved secret handling, Optuna plugin for hyperparameter optimization, and enhanced thread safety mechanisms.Unit tests added: True
Estimated effort to review (1-5, lower is better): 5