-
Notifications
You must be signed in to change notification settings - Fork 333
Fix: FlyteDirectory fails for multiple files with the same name #3325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dcbfe6a
to
dadc642
Compare
Bito Automatic Review Failed - Technical Failure |
b824f2d
to
8241a82
Compare
Bito Automatic Review Failed - Technical Failure |
Signed-off-by: Martin Simonovsky <[email protected]>
Signed-off-by: Martin Simonovsky <[email protected]>
Signed-off-by: Martin Simonovsky <[email protected]>
Signed-off-by: Martin Simonovsky <[email protected]>
bde858a
to
3d4e564
Compare
Bito Automatic Review Failed - Technical Failure |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3325 +/- ##
==========================================
- Coverage 85.26% 75.76% -9.50%
==========================================
Files 386 215 -171
Lines 30276 22504 -7772
Branches 2969 2954 -15
==========================================
- Hits 25814 17051 -8763
- Misses 3615 4583 +968
- Partials 847 870 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I'm not sure why the single test of |
Tracking issue
Why are the changes needed?
Persisting
FlyteDirectory
doesn't work correctly when there are multiple files with the same name in the directory.Let's have two files:
And a trivial workflow:
Running with
pyflyte run ... --fd /home/user/my_dir
results in onlyfile.txt
being uploaded in the flyte S3 bucket. The filesubdir/file.txt
is skipped. This is of course not as expected.In addition, note that if the files have a different content, this raises an exception in
get_upload_signed_url
!Our current workaround is to pack a directory into an archive and use
FlyteFile
.What changes were proposed in this pull request?
This bugfix is based on the observation that
get_upload_signed_url
is only provided file basename and file content hash to determine the URL. This leads to the both files ending up in the same location (the directory is flatten). To fix that, I add the path relative to the local root to the filename. This seems to work and both files are uploaded.Disclaimer: I have no idea if there are any side effects of this fix. It can be that some other functionality gets broken. Or there might be a better fix. I would appreciate a review from the maintainers:).
How was this patch tested?
Empirical test: For the minimal example above this works, the call to
super()._put
inFlyteFS._put()
returns two objects:['s3://flyte/flytesnacks/development/QLGCX23Y7HX3FI2LIEGVF5OW7Q======/subdir/file.txt', 's3://flyte/flytesnacks/development/QLGCX23Y7HX3FI2LIEGVF5OW7Q======/file.txt']
. This works both when the file content is identical and when it's different.Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This pull request fixes a bug in the FlyteDirectory persistence mechanism that prevented the upload of multiple files with the same name and content. By including the relative path of files in the upload process, it enhances the Flyte framework's ability to manage directories with duplicate files effectively.