Skip to content

Add the utility function to clear page cache #741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jun 16, 2025
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions cpp/include/kvikio/file_utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -180,4 +180,23 @@ std::pair<std::size_t, std::size_t> get_page_cache_info(std::string const& file_
* @sa `get_page_cache_info(std::string const&)` overload.
*/
std::pair<std::size_t, std::size_t> get_page_cache_info(int fd);

/**
* @brief Clear the page cache
*
* @param reclaim_dentries_and_inodes Whether to free reclaimable slab objects which include
* dentries and inodes.
* - If `true`, equivalent to executing `echo 3 > /proc/sys/vm/drop_caches`;
* - If `false`, equivalent to executing `echo 1 > /proc/sys/vm/drop_caches`.
* @param clear_dirty_pages Whether to trigger the writeback process to clear the dirty pages. If
* `true`, `sync` will be called prior to cache clearing.
* @return Whether the page cache has been successfully cleared
*
* @note This function creates a child process and executes the cache clearing shell command with
* `sudo`. Superuser privilege is therefore needed for the function to return `true`.
*
* @throws kvikio::GenericSystemError if somehow the child process could not be created, or its
* status could not be retrieved
*/
bool clear_page_cache(bool reclaim_dentries_and_inodes = true, bool clear_dirty_pages = true);
} // namespace kvikio
15 changes: 15 additions & 0 deletions cpp/src/file_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,10 @@
#include <sys/mman.h>
#include <sys/stat.h>
#include <unistd.h>

#include <sstream>
#include <stdexcept>
#include <string>
#include <system_error>
#include <utility>
#include <vector>
Expand Down Expand Up @@ -209,4 +212,16 @@ std::pair<std::size_t, std::size_t> get_page_cache_info(int fd)
SYSCALL_CHECK(munmap(addr, file_size));
return {num_pages_in_page_cache, num_pages};
}

bool clear_page_cache(bool reclaim_dentries_and_inodes, bool clear_dirty_pages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would we want to call this without clearing the dirty pages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say these options are for curious minds who wonder the respective effect on page cache 😃

{
KVIKIO_NVTX_FUNC_RANGE();
if (clear_dirty_pages) { sync(); }
std::string param = reclaim_dentries_and_inodes ? "3" : "1";
std::stringstream ss;
ss << "echo " << param << " | sudo tee /proc/sys/vm/drop_caches 1>/dev/null";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@kingcrimsontianyu kingcrimsontianyu Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cuDF's implementation still relies on sudo or superuser privilege. The logic is:

  1. run the cache clearing command in a shell
  2. run the sudo-ed command in a shell
  3. if both of them fail, throw an exception

Consequently:

  • General case
    • For superuser, 1 succeeds.
    • For non-superuser, 1 fails, 2 succeeds once providing the right password.
  • Docker container
    • Without --priviledged argument, both fail regardless of superuser or not.
    • With --priviledged, the same with the general case (except that no password prompt for non-superuser).

So this PR simplifies 1 and 2 to a single sudo command, and also returns boolean to indicate success/failure instead of throwing on failure.

PS: I actually had another implementation earlier that does not use sudo at all: 606b233#diff-e00c8157cb029cd5327cfe9b8b834601b40dcfef0b847bc3d6272e3c7b5c3c1cR216
This method does not have the problem in system() or popen() (they create a separate process and run the shell command, which adds to security vulnerability). But it also lacks the convenience: users need to run the entire process with sudo in order to clear the cache, as opposed to only sudo the cache clearing part in the shell process.

PSS: I also had an experimental snippet that does something like below, with the hope that I can achieve temporary privilege elevation for non-superusers:

perform tasks
{
    // temporarily elevate privilege
    // stash the real uid and real gid
    SuperUserRaiiContext ctx;
    perform tasks
} //  set to the original real uid and real gid
perform tasks

But I scratched this idea too, because it still requires users to run the process with elevated privilege, and as a result the code prior to SuperUserRaiiContext is still privileged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vuule explained that the non-sudo command is used on specially configured machines where unprivileged users have no access to sudo executable but can run sysctl. This complicates things, as the sysctl command does not return non-zero code on insufficient permission failure, unlike the other approach echo 3 | tee /proc/sys/vm/drop_caches which does. So I changed to cuDF's implementation that uses stderr to check failure instead.

auto ret = system(ss.str().c_str());
SYSCALL_CHECK(ret);
return ret == 0;
}
} // namespace kvikio
4 changes: 4 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ CuFile
.. autoclass:: IOFuture
:members:

.. autofunction:: get_page_cache_info

.. autofunction:: clear_page_cache

CuFile driver
-------------
.. currentmodule:: kvikio.cufile_driver
Expand Down
3 changes: 2 additions & 1 deletion python/kvikio/kvikio/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,15 @@

from kvikio._lib.defaults import CompatMode # noqa: F401
from kvikio._version import __git_commit__, __version__
from kvikio.cufile import CuFile, get_page_cache_info
from kvikio.cufile import CuFile, clear_page_cache, get_page_cache_info
from kvikio.remote_file import RemoteFile, is_remote_file_available

__all__ = [
"__git_commit__",
"__version__",
"CuFile",
"get_page_cache_info",
"clear_page_cache",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we try to keep __all__ alphabetized. Could you sort these lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sorted.

"RemoteFile",
"is_remote_file_available",
]
8 changes: 8 additions & 0 deletions python/kvikio/kvikio/_lib/file_handle.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,10 @@ cdef extern from "<kvikio/file_utils.hpp>" nogil:
pair[size_t, size_t] cpp_get_page_cache_info_int \
"kvikio::get_page_cache_info"(int fd) except +

bool cpp_clear_page_cache "kvikio::clear_page_cache" \
(bool reclaim_dentries_and_inodes, bool clear_dirty_pages) \
except +


def get_page_cache_info(file: Union[os.PathLike, str, int, io.IOBase]) \
-> tuple[int, int]:
Expand All @@ -202,3 +206,7 @@ def get_page_cache_info(file: Union[os.PathLike, str, int, io.IOBase]) \
else:
raise ValueError("The type of `file` must be `os.PathLike`, `str`, `int`, "
"or `io.IOBase`")


def clear_page_cache(reclaim_dentries_and_inodes: bool, clear_dirty_pages: bool):
return cpp_clear_page_cache(reclaim_dentries_and_inodes, clear_dirty_pages)
24 changes: 24 additions & 0 deletions python/kvikio/kvikio/cufile.py
Original file line number Diff line number Diff line change
Expand Up @@ -458,3 +458,27 @@ def get_page_cache_info(
and the total number of pages.
"""
return file_handle.get_page_cache_info(file)


def clear_page_cache(
reclaim_dentries_and_inodes: bool = True, clear_dirty_pages: bool = True
) -> bool:
"""Clear the page cache

Parameters
----------
reclaim_dentries_and_inodes: bool, optional
Whether to free reclaimable slab objects which include dentries and inodes.

- If `true`, equivalent to executing `echo 3 > /proc/sys/vm/drop_caches`;
- If `false`, equivalent to executing `echo 1 > /proc/sys/vm/drop_caches`.
clear_dirty_pages: bool, optional
Whether to trigger the writeback process to clear the dirty pages. If `true`,
`sync` will be called prior to cache dropping.

Returns
-------
bool
Whether the page cache has been successfully cleared.
"""
return file_handle.clear_page_cache(reclaim_dentries_and_inodes, clear_dirty_pages)
Loading