-
Notifications
You must be signed in to change notification settings - Fork 4
Implement S3 beatmap storage #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
S3 is being a bit too slow for my liking without this. Also includes not re-initialising the S3 client on every request which I am starting to suspect is just wrong. It's what spectator server does, but doing that causes the 32-client pool to be recreated every time which... given .NET guidelines on `HttpClient` usage, does... not seem optimal...
client = new AmazonS3Client( | ||
new BasicAWSCredentials(AppSettings.S3AccessKey, AppSettings.S3SecretKey), | ||
new AmazonS3Config | ||
{ | ||
CacheHttpClient = true, | ||
HttpClientCacheSize = 32, | ||
RegionEndpoint = RegionEndpoint.USWest1, | ||
UseHttp = true, | ||
ForcePathStyle = true, | ||
RetryMode = RequestRetryMode.Legacy, | ||
MaxErrorRetry = 5, | ||
Timeout = TimeSpan.FromMinutes(1), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that contrary to osu-server-spectator
(and more?) this is created in the ctor. This service also has a singleton lifetime. This means that this object is shared.
I suspect that osu-server-spectator
is very wrong in instantiating AmazonS3Client
on every call. The reason why is that HttpClientCacheSize
is supposed to specify how many HttpClient
instances the amazon client can use for requests. Which would mean that every instantiation of AmazonS3Client
spawns 32 HttpClient
s. The guidance on HttpClient
usage is that it is supposed to be shared, as in you're not supposed to create one every call.
(All of that would be nullified if it turned out AmazonS3Client
is doing some black magic hackery with static
to ensure the http client cache is created once ever, but on a quick source inspection I do not believe that to be true.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there's an argument here that rather than doing its own HttpClient
pooling this should be feeding into the ASP.NET-provided HttpClient
DI factory machinery. I would be OK with making it do that, but am lacking context as to why this HttpClientCacheSize
was being set to what it was (probably originates from 5 projects back and kept getting copy-pasted around as convenient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds very suspicious. I do recall that this should be a singleton by design, so I think we probably want to fix that in server-spectator
as well.
await Task.WhenAll( | ||
uploadBeatmapPackage(beatmapSetId, beatmapPackage, stream), | ||
uploadBeatmapFiles(files)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some parallelisation here because in testing S3 was being pretty slow for me.
Note that due to the structure of everything every beatmap upload basically incurs a full S3 roundtrip, e.g. a download of the previous package and an upload of the new package, plus the upload of the individual files.
The files upload is a bit wasteful, since there is a possibility of the files already existing. However, eliminating that is annoying structurally; I'd have to pass in stuff to indicate which files to upload and which to not which gets a bit dicey for me. It can probably be done, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always good to have parallel uploads for S3, because yeah, the round trip time can fluctuate (especially based around server region). I think this stems from the upload performing quite a few requests to get to the final state.
- `CacheHttpClient` defaults to true on .NET (as in past .NET Core) anyway. - The `HttpClientCacheSize` default of 1 is likely the better one these days, what with how `HttpClient` is designed to be used.
This one probably deserves a bit more scrutiny because there are a few details that stand out. I'll be adding a few self-review comments on those after I open.