Skip to content

optimisations arm cond

ITotalJustice edited this page Sep 5, 2022 · 1 revision

the upper 4-bits of an arm opcode are used as a condition flag which have various checks to whether that instruction is skipped or executed.

looking at the check_cond function, its a very simple switch on the condition.

auto check_cond(const Gba& gba, const u8 cond) -> bool
{
    switch (cond & 0xF)
    {
        case COND_EQ: return CPU.cpsr.Z;
        case COND_NE: return !CPU.cpsr.Z;
        case COND_CS: return CPU.cpsr.C;
        case COND_CC: return !CPU.cpsr.C;
        case COND_MI: return CPU.cpsr.N;
        case COND_PL: return !CPU.cpsr.N;
        case COND_VS: return CPU.cpsr.V;
        case COND_VC: return !CPU.cpsr.V;

        case COND_HI: return CPU.cpsr.C && !CPU.cpsr.Z;
        case COND_LS: return !CPU.cpsr.C || CPU.cpsr.Z;
        case COND_GE: return CPU.cpsr.N == CPU.cpsr.V;
        case COND_LT: return CPU.cpsr.N != CPU.cpsr.V;
        case COND_GT: return !CPU.cpsr.Z && (CPU.cpsr.N == CPU.cpsr.V);
        case COND_LE: return CPU.cpsr.Z || (CPU.cpsr.N != CPU.cpsr.V);
        case COND_AL: return true;

        default:
            assert(!"unreachable hit");
            return false;
    }
}

notice COND_AL is always true? well, most arm instructions are COND_AL. the rest of the conditions are used for if statements (same applies to for and while loops). keep this in mind for now!

here is your basic arm fetch and dispatch code.

const auto opcode = fetch(gba);
const auto cond = bit::get_range<28, 31>(opcode);

if (check_cond(gba, cond))
{
    execute();
}

~1028 fps.

the above is fine and correct. the compiler will also inline check_cond if its part of the same translation unit, or lto is enabled.

now lets look at the slightly improved version.

const auto opcode = fetch(gba);
const auto cond = bit::get_range<28, 31>(opcode);

if (check_cond(gba, cond)) [[likely]]
{
    execute();
}

~1050 fps.

what's different? the [[likely]] tag was added as it is far more common to execute instructions than skipping them.

we can go slightly further in optimising.

const auto opcode = fetch(gba);
const auto cond = bit::get_range<28, 31>(opcode);

if (cond == COND_AL || check_cond(gba, cond)) [[likely]]
{
    execute();
}

~1074 fps.

what's different? now i check if the cond == COND_AL before the switch is entered. this seems to be faster than the first version.

while this makes sense for it to be faster, i was still a little suprised because i figured the branch predictor would be doing it's magic when the case COND_AL: was constantly being hit.


tests are ran using my emulator in release mode with asserts disabled, using OpenLara.gba as the rom.

Clone this wiki locally