Atomic LD writeback requirement is expensive for small cores #51

Wren6991 · 2024-10-19T17:30:22Z

This extension is attractive for embedded-class cores because it significantly improves code density, and for the most part can be executed on the existing load/store unit. The only real complication is this part in section 2.1:

The LD instruction must however write the loaded data to the pair of destination registers atomically
to ensure fault handling is possible.

Practically speaking this means that, when issuing a back-to-back pair of lws, the first load can't write back to the register file until the fault response for the second load comes back from the memory subsystem. Since the fault response is usually aligned with the load data coming back from the bus (i.e. is in a later pipe stage), this requires either:

Delaying the writeback and inserting a pipe bubble after the second load (making ld slower than lw + lw), or
Supporting two-register writeback, and an additional 32-bit buffer to hold the first load's data whilst waiting for the second load to complete on the bus

Neither of these is desirable for a small core with a 2R1W register file. This could be completely avoided by relaxing the constraint to something like:

An LD instruction encountering a fault may write to at most one register in the pair rd, rd + 1. However, to make fault handling possible, an LD instruction which encounters a fault is guaranteed not to write back to rs1, even when this aliases a register in the pair rd, rd + 1.

This relaxed version can be implemented as a pair of lw, simply by swapping the order of the two loads based on the LSB of register number rs1 to ensure the first load in the pair can't clobber the base register.

The text was updated successfully, but these errors were encountered:

tovine · 2024-10-19T17:38:08Z

I don't have a big problem with this proposal, as the main reason for this limitation was to make sure it's possible to retry the same ld instruction after handling an exception.

christian-herber-nxp · 2024-10-19T18:02:55Z

I really like the proposal. As the spec has passed ARC review, I would like to address this as part of the public review cycle, which hopefully starts in a few weeks.

christian-herber-nxp added the public review Issues received during public review label Nov 4, 2024

christian-herber-nxp linked a pull request Nov 7, 2024 that will close this issue

Relaxed the requirement regarding atomic override of destination registers for LD to allow more implementation options #53

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic LD writeback requirement is expensive for small cores #51

Atomic LD writeback requirement is expensive for small cores #51

Wren6991 commented Oct 19, 2024 •

edited

Loading

tovine commented Oct 19, 2024 •

edited

Loading

christian-herber-nxp commented Oct 19, 2024

Atomic LD writeback requirement is expensive for small cores #51

Atomic LD writeback requirement is expensive for small cores #51

Comments

Wren6991 commented Oct 19, 2024 • edited Loading

tovine commented Oct 19, 2024 • edited Loading

christian-herber-nxp commented Oct 19, 2024

Wren6991 commented Oct 19, 2024 •

edited

Loading

tovine commented Oct 19, 2024 •

edited

Loading