instruction queue

There is a 128-bit buff in IQ stage, which stores instructions load from IF. The pc load in IQ will usually 64-bit align expect when a flush happens,

pre-decode

The oldest instruction will be pre-decoded firstly to knows if it's rvc, function return, function call, branch, fence_i.

branch prediction

According to the last instruction,

When the last instruction is rv64i_jal or rv64c_j, the next pc can be directly calculated out by adding pc to immediate.
When the last instruction is rv64i_jalr, rv64c_jr, or rv64c_jalr, the next pc may pop from ras, otherwise, Front-end should be stalled and wait until jalr execution complete.
When the last instruction is rv64i_bxx rv64c_beqz, or rv64c_bnez, a static prediction was used in this version. if the destination address offset is bigger than or equal to zero, the next pc will be predicted to "jump".
When there is a fence_i, the pipeline should be flush and re-fetch instruction at the next pc after the instruction is executed. fence_i is treated similar to branch mispredict.

BHT

This kind of prediction may cause a misprediction, the predicted result (jump or not jump) and opposite address (jump pc or next pc) will be pushed into the Branch History FIFO. The branch instruction will be executed in order, and when the result is valid, the branch history FIFO will pop out the history, and check if the prediction is correct. If there is a misprediction occurs, the front-end should be flushed first, the back-end will be flushed after the branch instruction committed.

BHT

RAS

Return Address Stack is a module that saves the return address. According to , if the integer register x1 and/or x5 is used in jal or jalr, it should be a function call or function return. In this way, when there is a function call, the return address can be pushed into the stack, and pop out directly when function return, and it's unnecessary to wait the result until the execution of jalr finish.

In Rift, ras is implemented with a ring stack. It will be never full but will be empty. The bottom pointer of the stack will increase with the top pointer when the stack is full but a new push comes. So the bottom data will be abandon. In this way, the pipeline will not be locked when RAS is full, when RAS is empty, although there is a function return comes, the front-end will wait as if it's not a function return.

Ring Stack

To ensure unrelated executed jalr will never break out a jalr stall, all ras result pop out from RAS will process an empty push into a FIFO, the FIFO will pop when a jalr handshake comes. only when a jalr handshake comes with FIFO empty, the jalr stall will be resolved.

Uh oh!

instruction queue