Skip to content

Conversation

@quyykk
Copy link
Contributor

@quyykk quyykk commented Apr 28, 2023

This fixes microsoft/vcpkg#31072 and microsoft/vcpkg#31132.

It changes the upload to the cache so that it happens in chunks of 450MB, instead of all at once. This is because GitHub rejects uploads bigger than ~500MB (educated guess) to its cache.

The implementation is a bit hacky though, but I haven't found a better solution: It splits the file into multiple 450MB chunk files on disk.

It now reads 450MB chunks from the file at a time.

@BillyONeal
Copy link
Member

Sorry for the noise, trying to verify that the transition over to GitHub Actions for the PR bot is working...

@quyykk quyykk requested a review from BillyONeal May 23, 2023 19:59
@autoantwort
Copy link
Contributor

I haven't found a better solution: It splits the file into multiple 450MB chunk files on disk.

You could pass the data via stdin.

@quyykk
Copy link
Contributor Author

quyykk commented Jun 4, 2023

You could pass the data via stdin.

I could but I honestly have no idea how.

@autoantwort
Copy link
Contributor

I could but I honestly have no idea how.

There is now #1134 :)

@quyykk
Copy link
Contributor Author

quyykk commented Aug 30, 2023

I've switched it to use stdin instead, and it still works 😄. This PR is again ready for review

@quyykk
Copy link
Contributor Author

quyykk commented Sep 7, 2023

Seems like the formatting errors are caused by GitHub upgrading their clang-format, and not my fault. 😄

@DerZade
Copy link

DerZade commented Jan 22, 2024

What is the status of this? 🤔

@quyykk
Copy link
Contributor Author

quyykk commented Mar 11, 2024

I fixed the merge conflicts that accumulated. Just needs someone from the team to review 😄

base_cmd.string_arg(url);

auto file_ptr = fs.open_for_read(file, VCPKG_LINE_INFO);
std::vector<char> buffer(chunk_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the default size is 450 MB. And no limits.
Is there an alternative to reading it into a buffer first just to forward it to anothers command stdin?
(Remembering all those Raspi people which struggle to build vcpkg due to low memory...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original way was to split the file on disk, but that's pretty hacky I think.

But I can decrease the buffer size. I'm not sure what you mean with limit.

Are people really running a GitHub Runner server on a Raspi lmao 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are people really running a GitHub Runner server on a Raspi lmao

Well, this is only the tool uploading the artifacts. Caching large artifacts is more important when build machine power is low.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. What buffer size do you think I should use? I can't make it really small or else the upload will be way slower than it would otherwise be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I see the trade-offs and barriers.

  • Can't make curl read chunks directly from (within) a large file.
  • Can't feed (the vcpkg function running) curl with piecewise input. (IO buffer smaller than network chunks.)

Changing curl (tool) is out of scope here.
If the interface remains running curl instead of calling into libcurl, then it would be best to fix the second point.
If this is too intrusive, it might be helpful to have a way for the user to change the buffer size, or at least to turn of the buffering in case of trouble.

std::size_t bytes_read = 0;
for (std::size_t i = 0; i < file_size; i += bytes_read)
{
bytes_read = file_ptr.read(buffer.data(), sizeof(decltype(buffer)::value_type), chunk_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a whole curl process launch per chunk like this is kind of a problem. I don't see reasonable ways to achieve the effect this PR wants without linking with libcurl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it could still be done like this but the chunk sizes would have to be bigger than make sense to denote as a single contiguous memory buffer; there should be more than one read / write per curl launch, etc.......

and that sounds like a lot more work than linking with libcurl.

@Neumann-A
Copy link
Contributor

#1422 pulls in libcurl.

@Neumann-A Neumann-A mentioned this pull request Jun 10, 2024
@JavierMatosD JavierMatosD added the requires:vcpkg-team-review This PR or issue requires someone from the vcpkg team to take a further look. label Oct 9, 2024
@pedroraimundo-ao
Copy link

@quyykk Is there any way to start using this in GitHub Actions at the moment? It is sorely needed to make CI builds that depend on qtbase bearable.

(maybe checking out a specific vcpkg branch in-tree? applying a patch and working dirty?)

@TheWillard
Copy link

@pedroraimundo-ao Probably not the answer you were looking for, but I switched to Conan for my dependencies.

@rainman110
Copy link

@BillyONeal What is the status of this PR? I think many struggle to cache dependencies (in particular qt), which

  • Take a very long time to build
  • And are very large

Without caching these large dependencies, VCPKG in unusable in github actions. Conan thus might be a good alternative.

@talregev
Copy link

talregev commented Apr 4, 2025

This is also should fix these error in the ci:
For vcpkg ci, maximum blok 4000 MiB.

check my issue as well:
microsoft/vcpkg#44060

@talregev
Copy link

talregev commented Apr 6, 2025

@quyykk
I started to look at your code, to think if I can rebase your changes to the latest vcpkg.
Are you still working on this PR?
I think if you make your changes optional, even if there is some problem with file allocation to memory,
vcpkg team will consider it, because the user will activate it with command line option.
Later on, people will can change it to more correct way with libcurl linked.

Let me know what you think.

@talregev
Copy link

talregev commented Apr 8, 2025

Hi all,
This is an high demand feature.
I started a new PR at #1643
The code is taken from here. I did it a very initial to see if I am in the right direction, and setup it on the CI.
I want to create a CI driven development, meaning, I want to test that it upload correctly by chunks to cloud in the CI.
Currently I lack the knowledge how to do it. Help will be appreciated.

My success meaning that it will be a feature that many people need and want.

@talregev
Copy link

I take this code and change it that it can now able to upload very large size binary cache to vcpkg ci, by chunks. Currently I set the chunk to 500MiB, but it can very until 4000Mib.

I tested it on vcpkg ci and it working.

It using standard input to upload the file as it not ideal.
I offered to add it as experimental with x flag, meaning it will only work if the user ask for it, and it not accepted by vcpkg team.

Now with my experience, I can also dev and try the same on GitHub action, but I am waiting to some progress from vcpkg team about the solution in the standard input.

You are welcome to test and review my PR.

Thank you all.

@vicroms
Copy link
Member

vicroms commented Apr 26, 2025

Closing this PR as per #1662

@vicroms vicroms closed this Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

requires:vcpkg-team-review This PR or issue requires someone from the vcpkg team to take a further look.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GitHub binary cache not stored (insufficient error message?)