-
Notifications
You must be signed in to change notification settings - Fork 76
Add the utility function to clear page cache #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.08
Are you sure you want to change the base?
Add the utility function to clear page cache #741
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test 7051979 |
cpp/src/file_utils.cpp
Outdated
if (clear_dirty_pages) { sync(); } | ||
std::string param = reclaim_dentries_and_inodes ? "3" : "1"; | ||
std::stringstream ss; | ||
ss << "echo " << param << " | sudo tee /proc/sys/vm/drop_caches 1>/dev/null"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also try without sudo? Some like: https://github.com/rapidsai/cudf/blob/6bc515d8219538b104861b48f2b8822a53be841f/cpp/benchmarks/io/cuio_common.cpp#L230
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cuDF's implementation still relies on sudo or superuser privilege. The logic is:
- run the cache clearing command in a shell
- run the sudo-ed command in a shell
- if both of them fail, throw an exception
Consequently:
- General case
- For superuser, 1 succeeds.
- For non-superuser, 1 fails, 2 succeeds once providing the right password.
- Docker container
- Without
--priviledged
argument, both fail regardless of superuser or not. - With
--priviledged
, the same with the general case (except that no password prompt for non-superuser).
- Without
So this PR simplifies 1 and 2 to a single sudo
command, and also returns boolean to indicate success/failure instead of throwing on failure.
PS: I actually had another implementation earlier that does not use sudo
at all: 606b233#diff-e00c8157cb029cd5327cfe9b8b834601b40dcfef0b847bc3d6272e3c7b5c3c1cR216
This method does not have the problem in system()
or popen()
(they create a separate process and run the shell command, which adds to security vulnerability). But it also lacks the convenience: users need to run the entire process with sudo
in order to clear the cache, as opposed to only sudo
the cache clearing part in the shell process.
PSS: I also had an experimental snippet that does something like below, with the hope that I can achieve temporary privilege elevation for non-superusers:
perform tasks
{
// temporarily elevate privilege
// stash the real uid and real gid
SuperUserRaiiContext ctx;
perform tasks
} // set to the original real uid and real gid
perform tasks
But I scratched this idea too, because it still requires users to run the process with elevated privilege, and as a result the code prior to SuperUserRaiiContext
is still privileged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vuule explained that the non-sudo command is used on specially configured machines where unprivileged users have no access to sudo
executable but can run sysctl
. This complicates things, as the sysctl command does not return non-zero code on insufficient permission failure, unlike the other approach echo 3 | tee /proc/sys/vm/drop_caches
which does. So I changed to cuDF's implementation that uses stderr to check failure instead.
@@ -209,4 +212,16 @@ std::pair<std::size_t, std::size_t> get_page_cache_info(int fd) | |||
SYSCALL_CHECK(munmap(addr, file_size)); | |||
return {num_pages_in_page_cache, num_pages}; | |||
} | |||
|
|||
bool clear_page_cache(bool reclaim_dentries_and_inodes, bool clear_dirty_pages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when would we want to call this without clearing the dirty pages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say these options are for curious minds who wonder the respective effect on page cache 😃
python/kvikio/kvikio/__init__.py
Outdated
from kvikio.remote_file import RemoteFile, is_remote_file_available | ||
|
||
__all__ = [ | ||
"__git_commit__", | ||
"__version__", | ||
"CuFile", | ||
"get_page_cache_info", | ||
"clear_page_cache", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we try to keep __all__
alphabetized. Could you sort these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Sorted.
cpp/include/kvikio/file_utils.hpp
Outdated
* @note This function creates a child process and executes the cache clearing shell command with | ||
* `sudo`. Superuser privilege is therefore generally needed for the function to return `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to say it first try without sudo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The comments have been updated.
This PR introduces a utility function to clear page cache in C++ and Python.