You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This extension is attractive for embedded-class cores because it significantly improves code density, and for the most part can be executed on the existing load/store unit. The only real complication is this part in section 2.1:
The LD instruction must however write the loaded data to the pair of destination registers atomically
to ensure fault handling is possible.
Practically speaking this means that, when issuing a back-to-back pair of lws, the first load can't write back to the register file until the fault response for the second load comes back from the memory subsystem. Since the fault response is usually aligned with the load data coming back from the bus (i.e. is in a later pipe stage), this requires either:
Delaying the writeback and inserting a pipe bubble after the second load (making ld slower than lw + lw), or
Supporting two-register writeback, and an additional 32-bit buffer to hold the first load's data whilst waiting for the second load to complete on the bus
Neither of these is desirable for a small core with a 2R1W register file. This could be completely avoided by relaxing the constraint to something like:
An LD instruction encountering a fault may write to at most one register in the pair rd, rd + 1. However, to make fault handling possible, an LD instruction which encounters a fault is guaranteed not to write back to rs1, even when this aliases a register in the pair rd, rd + 1.
This relaxed version can be implemented as a pair of lw, simply by swapping the order of the two loads based on the LSB of register number rs1 to ensure the first load in the pair can't clobber the base register.
The text was updated successfully, but these errors were encountered:
I don't have a big problem with this proposal, as the main reason for this limitation was to make sure it's possible to retry the same ld instruction after handling an exception.
I really like the proposal. As the spec has passed ARC review, I would like to address this as part of the public review cycle, which hopefully starts in a few weeks.
This extension is attractive for embedded-class cores because it significantly improves code density, and for the most part can be executed on the existing load/store unit. The only real complication is this part in section 2.1:
Practically speaking this means that, when issuing a back-to-back pair of
lw
s, the first load can't write back to the register file until the fault response for the second load comes back from the memory subsystem. Since the fault response is usually aligned with the load data coming back from the bus (i.e. is in a later pipe stage), this requires either:ld
slower thanlw
+lw
), orNeither of these is desirable for a small core with a 2R1W register file. This could be completely avoided by relaxing the constraint to something like:
This relaxed version can be implemented as a pair of
lw
, simply by swapping the order of the two loads based on the LSB of register numberrs1
to ensure the first load in the pair can't clobber the base register.The text was updated successfully, but these errors were encountered: