Skip to content

Conversation

@ranlz77
Copy link

@ranlz77 ranlz77 commented Aug 24, 2025

Add atomic file writing using os.replace() to prevent JSON file corruption that can occur during concurrent access to the file cache.

Fix:
#3106
#3213

(aws-cli)
#9632

@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 47.82609% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.09%. Comparing base (bbed2c0) to head (2aa811e).
⚠️ Report is 247 commits behind head on develop.

Files with missing lines Patch % Lines
botocore/utils.py 47.82% 12 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3544      +/-   ##
===========================================
- Coverage    93.13%   93.09%   -0.05%     
===========================================
  Files           68       68              
  Lines        15336    15432      +96     
===========================================
+ Hits         14283    14366      +83     
- Misses        1053     1066      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ashovlin ashovlin self-assigned this Sep 22, 2025
suffix='.tmp'
)

os.fchmod(temp_fd, 0o600)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works in unix based systems. You need to add a condition such that this line only runs when in unix based systems. For windows, the file permissions are set to read and write by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it with adding a condition to check unix based function.


call_args = mock_replace.call_args[0]
temp_path = call_args[0]
final_path = call_args[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line doing and why is it needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the mistake of unused var, fix it to try to check final_path of json file created by os.replace.


assert '.tmp' in temp_path

def test_concurrent_writes_same_key(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't accurately validate your implementation. Your code creates a separate temporary file for each process attempting to write, but this test is only checking for data corruption during concurrent writes to the same key. The test should align with how your implementation actually handles file operations else it needs to be removed.

Copy link
Contributor

@ubaskota ubaskota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting your contribution. I've reviewed your changes and added some comments for your consideration. Please address the feedback when possible. If you are unable to make the requested modifications, please let us know and our team will create a pull request with the necessary changes.

@ranlz77
Copy link
Author

ranlz77 commented Oct 5, 2025

Thank you for the detailed reviews, and apologies for the delayed feedback.
I've made the suggested modifications in the latest commits.

def write_worker(thread_id):
try:
for i in range(3):
self.cache[key] = {'thread': thread_id, 'iteration': i}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in my previous review as well, could you clarify what this line is doing? It looks like you’re writing to a single key, but your implementation creates several temporary files. Can you explain the reasoning behind that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am trying to make sure the isolation of each write (even to the same key) via separated temp files and the last-write-wins behavior. The test validates that all 9 write operations target the same final file, but os.replace ensures the atomicity and no corruption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Just trying to understand this a bit better. What’s the difference between “writing to the same key via separate temp files” and simply “writing to separate temp files”? In this case, what’s the reason for tying all writes to the same key?

Also, if you look at the checks, you will see that some tests are failing on Windows because it doesn’t allow multiple processes to write to the same file concurrently. On Windows, file permissions are stricter than on Unix systems, so concurrent writes to the same target file can fail. Please, check and validate that all the tests are passing after you publish a revision.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing to different keys actually works well even without this implementation. However in real-world scenario, we are using automated scripts that trigger aws-cli to simultaneously refresh the same cached credential, and that often occurs corruption JSON file. The test of writing to the same key via separated temp files is to simulate this sort of race condition and to validate the last-write-wins.

About the winsdow permission problem, I've made the modifications of adding some windows condition in the latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants