Skip to content

Implement S3 beatmap storage #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 26, 2024
Merged

Implement S3 beatmap storage #13

merged 7 commits into from
Dec 26, 2024

Conversation

bdach
Copy link
Contributor

@bdach bdach commented Dec 23, 2024

This one probably deserves a bit more scrutiny because there are a few details that stand out. I'll be adding a few self-review comments on those after I open.

S3 is being a bit too slow for my liking without this.

Also includes not re-initialising the S3 client on every request which I
am starting to suspect is just wrong. It's what spectator server does,
but doing that causes the 32-client pool to be recreated every time
which... given .NET guidelines on `HttpClient` usage, does... not seem
optimal...
@bdach bdach self-assigned this Dec 23, 2024
Comment on lines 26 to 38
client = new AmazonS3Client(
new BasicAWSCredentials(AppSettings.S3AccessKey, AppSettings.S3SecretKey),
new AmazonS3Config
{
CacheHttpClient = true,
HttpClientCacheSize = 32,
RegionEndpoint = RegionEndpoint.USWest1,
UseHttp = true,
ForcePathStyle = true,
RetryMode = RequestRetryMode.Legacy,
MaxErrorRetry = 5,
Timeout = TimeSpan.FromMinutes(1),
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that contrary to osu-server-spectator (and more?) this is created in the ctor. This service also has a singleton lifetime. This means that this object is shared.

I suspect that osu-server-spectator is very wrong in instantiating AmazonS3Client on every call. The reason why is that HttpClientCacheSize is supposed to specify how many HttpClient instances the amazon client can use for requests. Which would mean that every instantiation of AmazonS3Client spawns 32 HttpClients. The guidance on HttpClient usage is that it is supposed to be shared, as in you're not supposed to create one every call.

(All of that would be nullified if it turned out AmazonS3Client is doing some black magic hackery with static to ensure the http client cache is created once ever, but on a quick source inspection I do not believe that to be true.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there's an argument here that rather than doing its own HttpClient pooling this should be feeding into the ASP.NET-provided HttpClient DI factory machinery. I would be OK with making it do that, but am lacking context as to why this HttpClientCacheSize was being set to what it was (probably originates from 5 projects back and kept getting copy-pasted around as convenient).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds very suspicious. I do recall that this should be a singleton by design, so I think we probably want to fix that in server-spectator as well.

Comment on lines 51 to 53
await Task.WhenAll(
uploadBeatmapPackage(beatmapSetId, beatmapPackage, stream),
uploadBeatmapFiles(files));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some parallelisation here because in testing S3 was being pretty slow for me.

Note that due to the structure of everything every beatmap upload basically incurs a full S3 roundtrip, e.g. a download of the previous package and an upload of the new package, plus the upload of the individual files.

The files upload is a bit wasteful, since there is a possibility of the files already existing. However, eliminating that is annoying structurally; I'd have to pass in stuff to indicate which files to upload and which to not which gets a bit dicey for me. It can probably be done, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always good to have parallel uploads for S3, because yeah, the round trip time can fluctuate (especially based around server region). I think this stems from the upload performing quite a few requests to get to the final state.

@peppy peppy self-requested a review December 24, 2024 06:22
- `CacheHttpClient` defaults to true on .NET (as in past .NET Core)
  anyway.
- The `HttpClientCacheSize` default of 1 is likely the better one these
  days, what with how `HttpClient` is designed to be used.
@peppy peppy merged commit db04619 into ppy:master Dec 26, 2024
4 checks passed
@bdach bdach deleted the s3 branch December 26, 2024 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants