Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: add read_at/write_at/seek_read/seek_write for fs::File #6427

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

SteveLauC
Copy link
Contributor

@SteveLauC SteveLauC commented Mar 25, 2024

Motivation

See #1529

Solution

  1. Implemented using spawn_blocking(), see Feature request: read_at/write_at for tokio::fs::File #1529 (comment)
  2. This implementation has UNIX and Windows supported, WASM should be supported as well, but the corresponding standard interface is still nightly, so it is not included in this PR.

About the interface

These 2 methods are added directly to tokio::fs::File, the sync version APIs, are actually exposed through trait, FileExt, will change it to a trait, something like AsyncFileExt if you want.

@mox692 mox692 added A-tokio Area: The main tokio crate M-fs Module: tokio/fs labels Mar 25, 2024
@SteveLauC
Copy link
Contributor Author

SteveLauC commented Mar 25, 2024

Well, about the interface on Windows:

Originally, I thought seek_read()/seek_write() were the Windows version of POSIX pread()/pwrite(), but they are actually not, seek_read()/seek_write() will affect the file cursor (this is why the CI tests on Windows are failing)

Given the different behaviors, I think we should expose them by trait? I would like to hear your thoughts before making any further changes:)

@Darksonn
Copy link
Contributor

Darksonn commented Mar 26, 2024

You can provide a unix-specific method for read_at, and a different windows-specific method for seek_read. We usually don't use traits for this, and instead just use cfgs:

#[cfg(unix)]
#[cfg_attr(docsrs, doc(cfg(unix)))]
fn my_unix_specific_method() {}

The second method ensures that it is rendered properly in the documentation. Please double check the generated documentation for your code as well. You can do that either by generating it locally via the instructions in CONTRIBUTING.md, or by pushing to github and clicking "details" on the "netlify/tokio-rs/deploy-preview" ci job, which shows rendered documentation for your PR.

@SteveLauC
Copy link
Contributor Author

Thanks for that reminder of the doc cfg, I will do it soon:)

@SteveLauC SteveLauC changed the title fs: add read/write_at for fs::File fs: add read_at/write_at/seek_read/seek_write for fs::File Mar 27, 2024
@SteveLauC
Copy link
Contributor Author

SteveLauC commented Mar 27, 2024

Hi, this PR should be ready for review:

  • Interfaces:

    • read_at() write_at() for UNIX
    • seek_read() seek_write() for Windows
  • Documentation: copied from the corresponding standard library interfaces, i.e., these traits:

  • Examples added

  • Integration test added
    Though I am not sure if the file names are ok, should they be io_xxx.rs or fs_xxx.rs, they are added to fs::File, but they are actually I/O interfaces:<


Ping me if you guys need me to squash my commits

Comment on lines +580 to +584
let std = self.std.clone();
let n = buf.len();
let bytes_read = asyncify(move || _read_at(&std, n, offset)).await?;
let len = bytes_read.len();
buf[..len].copy_from_slice(&bytes_read);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tokio file already has a bunch of logic for keeping track of and reusing a buffer, but none of these functions use that logic. Is there a reason you're doing it this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tokio file already has a bunch of logic for keeping track of and reusing a buffer,

Well, I am not aware of this, do you mean the bytes crate? If not, would you like to show me some examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean the stuff related to these types:

tokio/tokio/src/fs/file.rs

Lines 96 to 118 in c98e22f

struct Inner {
state: State,
/// Errors from writes/flushes are returned in write/flush calls. If a write
/// error is observed while performing a read, it is saved until the next
/// write / flush call.
last_write_err: Option<io::ErrorKind>,
pos: u64,
}
#[derive(Debug)]
enum State {
Idle(Option<Buf>),
Busy(JoinHandle<(Operation, Buf)>),
}
#[derive(Debug)]
enum Operation {
Read(io::Result<usize>),
Write(io::Result<()>),
Seek(io::Result<u64>),
}

Copy link
Contributor Author

@SteveLauC SteveLauC Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for showing me this!

I just took another look at other methods implemented with spawn_blocking(), seems that I should add:

        let mut inner = self.inner.lock().await;
        inner.complete_inflight().await;

before involving the actual logic, right?

Update: Well, seems more complicated than I thought, I originally thought we could simply implement it using asyncify()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could probably argue either way about whether it's even desireable. One advantage of read_at is that you can have several calls in parallel, but if we use the state logic, then they will run one-after-the-other.

I'm not sure what the best answer here is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of read_at is that you can have several calls in parallel, but if we use the state logic, then they will run one-after-the-other.

Thanks for pointing it out! The capability of enabling concurrent access is indeed the reason why pread/pwrite are added, so I slightly tend to use separate buffers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could probably argue either way about whether it's even desireable. One advantage of read_at is that you can have several calls in parallel, but if we use the state logic, then they will run one-after-the-other.

I'm not sure what the best answer here is.

I have a tentative plan, can we use RwLock instead of Mutex to achieve sharing among multiple readers and allowing them run in parallel?

inner: Mutex<Inner>,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chasing1020 A read/write lock doesn't help. Fundamentally, the shared buffer can only hold one piece of data at the time.

@asomers
Copy link
Contributor

asomers commented Apr 10, 2024

I think this would be a good feature. But what would make it really great would be if it can have platform-specific integration. For example, aio_read on FreeBSD. We need to be very careful with the initial implementation to ensure that acceleration will be possible. For example to allow for errors to be returned in the right places. Here is the existing FreeBSD version. https://docs.rs/tokio-file/latest/tokio_file/ .

@SteveLauC
Copy link
Contributor Author

But what would make it really great would be if it can have platform-specific integration.

I agree and I am ok to have real async I/O for platforms that are capable of doing it

@asomers
Copy link
Contributor

asomers commented Apr 27, 2024

I'm working on the platform-specific part right now. However, even though aio_read is nearly equivalent to an asynchronous read_at, it's error handling behavior is slightly different. For example, aio_read may return EAGAIN if it hits a system resource limitation. The caller must be prepared for that. Therefore, I think the AIO acceleration should be opt-in. How about moving these methods into an extension trait named something like AsyncFileExt and then creating a second extension trait named AioFileExt whose methods have the same signature? Then opting into AIO would just consist of importing the AioFileExt trait instead of the AsyncFileExt trait?

@Darksonn
Copy link
Contributor

@asomers Can you open a new feature request for an AIO implementation of tokio::fs::File. Then we can discuss that issue there.

Regarding this PR, I still don't know what to do about the question of buffer management. The tokio::fs::File type kind of has the assumption that it will only need one buffer, but these methods violate that assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-fs Module: tokio/fs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants