-
Notifications
You must be signed in to change notification settings - Fork 246
S3 Traverser: Reusing S3Client in traverser as fix for sync orchestrator scale issue #3153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mover/stage2
Are you sure you want to change the base?
Conversation
…tor scale issue In sync orchestrator, we create traversers for each path prefix which were creating their own S3Client. This led to failures with scale runs. Using sync.once to make sure that S3Client is created once and reused thereafter.
…hestrator and double counting files
@@ -90,7 +90,7 @@ func CreateS3Client(ctx context.Context, credInfo CredentialInfo, option Credent | |||
} | |||
//support custom credential provider | |||
if credInfo.S3CredentialInfo.Provider != nil { | |||
fmt.Println("Using custom credentials") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? I remember we saw this in logs to understand the issue and if i make it info, we may not see it anymore as we set error. We also should not see this a lot with the fix. I can leave this as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to remove. @pbanakar-microsoft any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave it as is for now, to validate the fix is working fine. This should get logged only twice.
// This is particularly useful for sync orchestrator which creates many traversers for different path prefixes | ||
// This allows us to avoid creating a new S3 client for each traverser, improving performance and reducing resource usage. | ||
// This is a singleton instance, so it can be shared across multiple traversers. | ||
// It uses sync.Once to ensure that the client is created only once, even if multiple traversers are created concurrently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to use this client manager in STE as well, or will it default to using clientFactory solution. Are there any issues with that?
…es, S2sPreserveAccessTier, S2sInvalidMetadataHandleOption, S2sSourceChangeValidation, S2sGetPropertiesInBackend
…ination folder or prefix is not found
1. Performing only indexer map operations under lock instead of the whole comparison. Since this lock blocks all sync orchestrator go routines, this isolation helps with contention 2. Replacing RW locks with R locks whereever possible 3. Cleaning up comparator logic to simplify and exit sooner
1. Added different parallelism for source and target traversal to reduce waiting for target traversals. 2. Cleaned up throttler code and added safe limits based on testing 3. Handled indexer cleanup in case of target traversal failure
…timeouts for traversals.
…s and slow directory traversals
428a321
to
15b6763
Compare
5ff8477
to
15b6763
Compare
Description
S3 Traverser: Reusing S3Client in traverser as fix for sync orchestrator scale issue
Feature / Bug Fix: (Brief description of the feature or issue being addressed)
Related Links:
Issues
Team thread
Documents
[Email Subject]
Type of Change
How Has This Been Tested?
Thank you for your contribution to AzCopy!