Skip to content

Conversation

shnayak-msft
Copy link
Collaborator

Description

S3 Traverser: Reusing S3Client in traverser as fix for sync orchestrator scale issue

  1. In sync orchestrator, we create traversers for each path prefix which were creating their own S3Client. This led to failures with scale runs. Using sync.once to make sure that S3Client is created once and reused thereafter.
  2. Bug fix: S3 traverser not enumerating directory prefixes for sync orchestrator and double counting files
  • Feature / Bug Fix: (Brief description of the feature or issue being addressed)

  • Related Links:

  • Issues

  • Team thread

  • Documents

  • [Email Subject]

Type of Change

  • Bug fix
  • New feature
  • Documentation update required
  • Code quality improvement
  • Other (describe):

How Has This Been Tested?

Thank you for your contribution to AzCopy!

…tor scale issue

In sync orchestrator, we create traversers for each path prefix which were creating their own S3Client. This led to failures with scale runs. Using sync.once to make sure that S3Client is created once and reused thereafter.
@@ -90,7 +90,7 @@ func CreateS3Client(ctx context.Context, credInfo CredentialInfo, option Credent
}
//support custom credential provider
if credInfo.S3CredentialInfo.Provider != nil {
fmt.Println("Using custom credentials")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? I remember we saw this in logs to understand the issue and if i make it info, we may not see it anymore as we set error. We also should not see this a lot with the fix. I can leave this as is for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to remove. @pbanakar-microsoft any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leave it as is for now, to validate the fix is working fine. This should get logged only twice.

// This is particularly useful for sync orchestrator which creates many traversers for different path prefixes
// This allows us to avoid creating a new S3 client for each traverser, improving performance and reducing resource usage.
// This is a singleton instance, so it can be shared across multiple traversers.
// It uses sync.Once to ensure that the client is created only once, even if multiple traversers are created concurrently.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to use this client manager in STE as well, or will it default to using clientFactory solution. Are there any issues with that?

otendolkar and others added 13 commits August 11, 2025 12:29
…es, S2sPreserveAccessTier, S2sInvalidMetadataHandleOption, S2sSourceChangeValidation, S2sGetPropertiesInBackend
1. Performing only indexer map operations under lock instead of the whole comparison. Since this lock blocks all sync orchestrator go routines, this isolation helps with contention
2. Replacing RW locks with R locks whereever possible
3. Cleaning up comparator logic to simplify and exit sooner
1. Added different parallelism for source and target traversal to reduce waiting for target traversals.
2. Cleaned up throttler code and added safe limits based on testing
3. Handled indexer cleanup in case of target traversal failure
@otendolkar otendolkar force-pushed the users/shnayak/c2c-sync-scale-fix branch from 428a321 to 15b6763 Compare August 14, 2025 20:31
@shnayak-msft shnayak-msft force-pushed the users/shnayak/c2c-sync-scale-fix branch from 5ff8477 to 15b6763 Compare August 14, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants