Skip to content

add: continue partial operations in uring engine#168

Draft
VasilyStepanov wants to merge 2 commits intomainfrom
add/partial
Draft

add: continue partial operations in uring engine#168
VasilyStepanov wants to merge 2 commits intomainfrom
add/partial

Conversation

@VasilyStepanov
Copy link
Member

No description provided.

@VasilyStepanov VasilyStepanov self-assigned this Jan 7, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @VasilyStepanov, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural change to the io_uring queue implementation by introducing a new Event abstraction. This refactoring aims to streamline the handling of various io_uring operations (reads, writes, polls) and, crucially, to enable the continuation and proper accumulation of results for partial operations. By encapsulating the io_uring submission queue entry (SQE) preparation and completion dispatch within dedicated Event objects, the design enhances modularity, simplifies error handling, and lays the groundwork for more robust asynchronous I/O.

Highlights

  • New Event Abstraction: Introduced uring_event.hpp and uring_event.cpp to define a new Event base class and specialized derived classes (e.g., EventPoll, EventScalarRead, EventVectorWrite) for different io_uring operations.
  • Refactored uring_queue: The uring_queue methods (poll, read, write) have been refactored to utilize these new Event objects, delegating the io_uring_sqe preparation to the Event::prep() method.
  • Enhanced Completion Handling for Partial Operations: The uring_queue::wait method now retrieves the Event object associated with a completed io_uring request, allowing the Event to manage its own result accumulation (set_result) and dispatch its callback (dispatch). This change specifically enables the continuation of partial operations by accumulating results.
  • Improved SQE Acquisition: The uring_queue now provides a get_sqe() method that encapsulates the logic for acquiring a submission queue entry and handles potential ENOBUFS errors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@VasilyStepanov VasilyStepanov linked an issue Jan 7, 2026 that may be closed by this pull request
@VasilyStepanov VasilyStepanov changed the title add: continue partial operations in uring queue implementation add: continue partial operations in uring engine Jan 7, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an Event abstraction layer for io_uring operations, with the goal of supporting continued partial operations. While the overall structure is a good step, the core logic for handling partial I/O is incomplete and incorrect. The prep methods for all read/write events do not adjust for partially completed operations, and the wait function in Queue does not handle the resubmission of incomplete events. These issues are critical as they defeat the purpose of the change and will lead to incorrect I/O behavior. My review provides specific feedback and suggestions to correctly implement the partial operation handling.

Comment on lines 40 to 49
void EventScalarRead::prep() {
io_uring_sqe* sqe = _q.get_sqe();

io_uring_prep_read(
sqe, _t->fd(), static_cast<rawstor::io::TaskScalar*>(_t.get())->buf(),
static_cast<rawstor::io::TaskScalar*>(_t.get())->size(), 0
);

io_uring_sqe_set_data(sqe, this);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This prep method doesn't account for partial reads. To support continuing partial operations, you need to adjust the buffer pointer and the size of the read based on the number of bytes already read, which is stored in _result. The current implementation will always re-read from the beginning of the buffer, potentially overwriting data or reading the same data again.

This issue applies to all other scalar prep methods for read and write operations in this file.

void EventScalarRead::prep() {
    io_uring_sqe* sqe = _q.get_sqe();
    auto* task = static_cast<rawstor::io::TaskScalar*>(_t.get());

    io_uring_prep_read(
        sqe, task->fd(), static_cast<char*>(task->buf()) + _result,
        task->size() - _result, 0
    );

    io_uring_sqe_set_data(sqe, this);
}

Comment on lines 51 to 120
void EventVectorRead::prep() {
io_uring_sqe* sqe = _q.get_sqe();

io_uring_prep_readv(
sqe, _t->fd(), static_cast<rawstor::io::TaskVector*>(_t.get())->iov(),
static_cast<rawstor::io::TaskVector*>(_t.get())->niov(), 0
);

io_uring_sqe_set_data(sqe, this);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to EventScalarRead, this prep method for vectorized reads does not account for partial operations. It will always resubmit the entire iovec array from the beginning.

To fix this, you need to adjust the iovec array to skip completed vectors and modify the first partially read vector to point to the remaining data. Since io_uring_prep_readv does not take an offset into the first vector, you may need to modify the iovec array. One approach is to advance a pointer through the iovec array and adjust the iov_base and iov_len of the first non-completed vector.

For example:

auto* task = static_cast<rawstor::io::TaskVector*>(_t.get());
struct iovec* iov = task->iov();
unsigned int niov = task->niov();
size_t bytes_done = _result;

// Find the starting iovec and adjust it
while (bytes_done > 0 && niov > 0) {
    if (bytes_done < iov->iov_len) {
        iov->iov_base = static_cast<char*>(iov->iov_base) + bytes_done;
        iov->iov_len -= bytes_done;
        break;
    } else {
        bytes_done -= iov->iov_len;
        iov++;
        niov--;
    }
}

if (niov > 0) {
    io_uring_prep_readv(sqe, _t->fd(), iov, niov, 0);
}

This approach modifies the iovec array in the task, which might have side effects. Consider if this is acceptable or if a copy is needed. This same logic applies to all other vectorized and message-based prep methods.

Comment on lines 62 to 134
void EventScalarPositionalRead::prep() {
io_uring_sqe* sqe = _q.get_sqe();

io_uring_prep_read(
sqe, _t->fd(),
static_cast<rawstor::io::TaskScalarPositional*>(_t.get())->buf(),
static_cast<rawstor::io::TaskScalarPositional*>(_t.get())->size(),
static_cast<rawstor::io::TaskScalarPositional*>(_t.get())->offset()
);

io_uring_sqe_set_data(sqe, this);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to non-positional reads, this prep method for positional reads doesn't account for partial operations. In addition to adjusting the buffer pointer and size, you must also adjust the file offset based on the bytes already transferred (_result).

void EventScalarPositionalRead::prep() {
    io_uring_sqe* sqe = _q.get_sqe();
    auto* task = static_cast<rawstor::io::TaskScalarPositional*>(_t.get());

    io_uring_prep_read(
        sqe, task->fd(),
        static_cast<char*>(task->buf()) + _result,
        task->size() - _result,
        task->offset() + _result
    );

    io_uring_sqe_set_data(sqe, this);
}

Comment on lines 198 to 208
std::unique_ptr<Event> event(
static_cast<Event*>(io_uring_cqe_get_data(cqe))
);

size_t result = cqe->res >= 0 ? cqe->res : 0;
int error = cqe->res < 0 ? -cqe->res : 0;
event->set_result(cqe->res);

io_uring_cqe_seen(&_ring, cqe);

--_events;

(*t)(result, error);
event->dispatch();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of wait assumes that every completed operation is final. It takes ownership of the Event object via std::unique_ptr and destroys it after dispatching. This loses the state (_result) required for continuing partial operations.

To correctly handle partial reads/writes, you should check if the operation is complete. If it is not, you should re-prepare the event for the next part of the operation and resubmit it, instead of dispatching the callback and destroying the event. This would involve:

  1. Adding a virtual is_complete() method to the Event class hierarchy to check against the total expected size from the Task.
  2. In this wait function, calling is_complete() after updating the result.
  3. If not complete, call event->prep() and then event.release() to prevent the unique_ptr from destroying it. The event will be picked up on a subsequent io_uring_submit.
  4. If complete, then decrement _events and call event->dispatch() as you do now.

@VasilyStepanov VasilyStepanov force-pushed the add/partial branch 6 times, most recently from cd3b36c to 8e3e021 Compare January 10, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add multishot tasks

1 participant