-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add atomic write of creating json cache #3544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3544 +/- ##
===========================================
- Coverage 93.13% 93.09% -0.05%
===========================================
Files 68 68
Lines 15336 15432 +96
===========================================
+ Hits 14283 14366 +83
- Misses 1053 1066 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
botocore/utils.py
Outdated
| suffix='.tmp' | ||
| ) | ||
|
|
||
| os.fchmod(temp_fd, 0o600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works in unix based systems. You need to add a condition such that this line only runs when in unix based systems. For windows, the file permissions are set to read and write by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it with adding a condition to check unix based function.
tests/unit/test_utils.py
Outdated
|
|
||
| call_args = mock_replace.call_args[0] | ||
| temp_path = call_args[0] | ||
| final_path = call_args[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this line doing and why is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the mistake of unused var, fix it to try to check final_path of json file created by os.replace.
|
|
||
| assert '.tmp' in temp_path | ||
|
|
||
| def test_concurrent_writes_same_key(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't accurately validate your implementation. Your code creates a separate temporary file for each process attempting to write, but this test is only checking for data corruption during concurrent writes to the same key. The test should align with how your implementation actually handles file operations else it needs to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting your contribution. I've reviewed your changes and added some comments for your consideration. Please address the feedback when possible. If you are unable to make the requested modifications, please let us know and our team will create a pull request with the necessary changes.
|
Thank you for the detailed reviews, and apologies for the delayed feedback. |
| def write_worker(thread_id): | ||
| try: | ||
| for i in range(3): | ||
| self.cache[key] = {'thread': thread_id, 'iteration': i} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in my previous review as well, could you clarify what this line is doing? It looks like you’re writing to a single key, but your implementation creates several temporary files. Can you explain the reasoning behind that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am trying to make sure the isolation of each write (even to the same key) via separated temp files and the last-write-wins behavior. The test validates that all 9 write operations target the same final file, but os.replace ensures the atomicity and no corruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Just trying to understand this a bit better. What’s the difference between “writing to the same key via separate temp files” and simply “writing to separate temp files”? In this case, what’s the reason for tying all writes to the same key?
Also, if you look at the checks, you will see that some tests are failing on Windows because it doesn’t allow multiple processes to write to the same file concurrently. On Windows, file permissions are stricter than on Unix systems, so concurrent writes to the same target file can fail. Please, check and validate that all the tests are passing after you publish a revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing to different keys actually works well even without this implementation. However in real-world scenario, we are using automated scripts that trigger aws-cli to simultaneously refresh the same cached credential, and that often occurs corruption JSON file. The test of writing to the same key via separated temp files is to simulate this sort of race condition and to validate the last-write-wins.
About the winsdow permission problem, I've made the modifications of adding some windows condition in the latest commit.
Add atomic file writing using
os.replace()to prevent JSON file corruption that can occur during concurrent access to the file cache.Fix:
#3106
#3213
(aws-cli)
#9632