Skip to content

Conversation

@akx
Copy link
Contributor

@akx akx commented Jul 23, 2025

None of these arguments are referenced by the ruleset, but the set of arguments is used for caching endpoint resolution results via

@lru_cache_weakref(maxsize=CACHE_SIZE)
def resolve_endpoint(self, **input_parameters):
.

This means that the entire endpoint ruleset is evaluated (only to end up with the same result) for every S3-related request with a different Key, IOW for every file stored in S3. Since evaluating the rule tree is pretty slow, this causes significant slowdowns in mass operations, not to mention memory bloat for the (useless) cache.

The three removed parameters are not in the equivalent JS SDK code either.

Refs #3528.

None of these arguments are referenced by the ruleset, but set of arguments is used for caching endpoint resolution results.

This means that the entire endpoint ruleset is evaluated (only to end up with the same result) for every S3-related request with a different Key.

Refs boto#3528
@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.13%. Comparing base (6300f6b) to head (0f6e3cb).
Report is 460 commits behind head on develop.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3529      +/-   ##
===========================================
+ Coverage    92.94%   93.13%   +0.18%     
===========================================
  Files           66       68       +2     
  Lines        14956    15324     +368     
===========================================
+ Hits         13901    14272     +371     
+ Misses        1055     1052       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SamRemis
Copy link
Contributor

SamRemis commented Aug 1, 2025

Hi @akx,

Thank you for the PR. Unfortunately those models are owned upstream by the service team and can be updated at any time. Even if we were to merge this today, if the service team makes any unrelated changes to their endpoints tomorrow, this file would just be overwritten and the keys would come back.

While these aren't used in ruleset evaluation for botocore, my current understanding is that they are used by other AWS products and cannot be removed from the upstream ruleset. I am still working on gathering more specifics.

Thank you for calling this out - it doesn't make sense for us to slow down evaluation for botocore users based on parameters we don't use. We should either come up with an alternate solution here that causes unused keys to be ignored for caching purposes or we should find a way to remove these upstream.

@akx
Copy link
Contributor Author

akx commented Aug 4, 2025

@SamRemis Hi, thanks for looking into this!

Yeah, I figured this isn't the correct way to fix this given these lines had been added by an automated bot commit anyway, but I wanted to get the ball rolling.

We should either come up with an alternate solution here that causes unused keys to be ignored for caching purposes [...]

As mentioned above, the JavaScript SDK has a better set of cache keys (clients/client-s3/src/endpoint/endpointResolver.ts added in aws/aws-sdk-js-v3@4be028b, "chore: codegen sync").

It looks like the smithy-typescript-codegen codegen code checks for this exact eventuality and elides parameters that aren't used by the ruleset for purposes of cache keying. (As it happens, I wrote basically similar code earlier to prepare for this PR and based on it, it looks like that these 3 are apparently the only parameters unused across all of Botocore.)

One somewhat simple option could be to add some tribal knowledge to the resolver in botocore for this particular use: iff the ruleset is for S3, and the original set of parameters in the ruleset is this known one, then remove the three unused args before passing to caching/evaluation? Also probably a test that would fail if the S3 ruleset's parameters change, so maintainers would know to take action to adjust the code.

Another more labor-intensive option I can think of that would probably be for the best in the longer run is to also codegen ruleset evaluation into Python, instead of having some Python code evaluate the JSON at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants