Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic LD writeback requirement is expensive for small cores #51

Open
Wren6991 opened this issue Oct 19, 2024 · 2 comments · May be fixed by #53
Open

Atomic LD writeback requirement is expensive for small cores #51

Wren6991 opened this issue Oct 19, 2024 · 2 comments · May be fixed by #53
Labels
public review Issues received during public review

Comments

@Wren6991
Copy link

Wren6991 commented Oct 19, 2024

This extension is attractive for embedded-class cores because it significantly improves code density, and for the most part can be executed on the existing load/store unit. The only real complication is this part in section 2.1:

The LD instruction must however write the loaded data to the pair of destination registers atomically
to ensure fault handling is possible.

Practically speaking this means that, when issuing a back-to-back pair of lws, the first load can't write back to the register file until the fault response for the second load comes back from the memory subsystem. Since the fault response is usually aligned with the load data coming back from the bus (i.e. is in a later pipe stage), this requires either:

  • Delaying the writeback and inserting a pipe bubble after the second load (making ld slower than lw + lw), or
  • Supporting two-register writeback, and an additional 32-bit buffer to hold the first load's data whilst waiting for the second load to complete on the bus

Neither of these is desirable for a small core with a 2R1W register file. This could be completely avoided by relaxing the constraint to something like:

An LD instruction encountering a fault may write to at most one register in the pair rd, rd + 1. However, to make fault handling possible, an LD instruction which encounters a fault is guaranteed not to write back to rs1, even when this aliases a register in the pair rd, rd + 1.

This relaxed version can be implemented as a pair of lw, simply by swapping the order of the two loads based on the LSB of register number rs1 to ensure the first load in the pair can't clobber the base register.

@tovine
Copy link
Collaborator

tovine commented Oct 19, 2024

I don't have a big problem with this proposal, as the main reason for this limitation was to make sure it's possible to retry the same ld instruction after handling an exception.

@christian-herber-nxp
Copy link
Collaborator

I really like the proposal. As the spec has passed ARC review, I would like to address this as part of the public review cycle, which hopefully starts in a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
public review Issues received during public review
Projects
None yet
3 participants