-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concerns about amount of allocated opcode space #47
Comments
Let me start by stating two obsevations:
Here are some results on code size and "performance" using the embench benchmark and a prototypical compiler: Results will improve as the compiler matures (you can see some clearly bad usages of Zilsd and missed opportunities). |
I agree this is consistent with the existing instructions, but this new instruction doesn't necessary need to follow the same inefficient encoding. I'd imagine a 5-bit scaled immediate would be sufficient for the majority of cases? Can you run objdump on the code generated by your prototype compiler to create a histogram of the used immediates?
These numbers look good for some of those benchmarks, but I'd like to know what ISA string this was built with. Since you use the entire Zcf opcode space this extension is incompatible with Zcmp and I would imagine push/pop has a larger impact on this benchmark overall? I am also very surprised by the cubic numbers - looking at the code this only performs floating point operations - I assume you were building for soft-float? |
I should have shared the isa string. Baseline is The benchmarks with very high code size reduction are those that have a high exposure to double. I do not have statistics for the immediate distribution, but embench would clearly not be representative here (In fact, most benchmarks are simply to small to have realistic immediate distribution). Please also understand that the specification is currently in the final phases of architecture review. I am happy to get questions and input in all phases, but it is best to provide during the internal review period, which is long past. |
I realize my feedback is most likely too late - I only happened to see the email about ratification of this extension last week so apologies for that. Looking at the code generation for these benchmarks I don't see any large immediates being used and only really see benefits for Zdinx. Based on this data I don't see a rationale for the full 12-bit immediate. I would like to point out that I do believe this extension is helpful for code size, I just don't think it needs a full 12-bit immediate. And once a full opcode has been allocated it becomes difficult to reclaim this space. Do you have any data for programs that are larger that just one small benchmark file? I imagine it helps on your internal programs - knowing how much benefit you get one some internal benchmarks would also be helpful. I will try building your toolchain and run this on a larger program. |
Embench is far too small of a benchmark to make use of the full immediate value range. Same is true for any other load/store in any ISA. Still 12-bit is the norm.
That would be great. |
In general I think this extension makes a lot of sense, but I am slightly concerned about how much opcode space is being used here.
While I see that just using the "double-word" encoding makes a lot of sense from a simplicity point, it burns a lot of opcode space: do we really need a 12-bit immediate for the offset?
Additionally, that immediate is unscaled even though it only really makes sense to use it for multiples of 8, wasting 3 bits of the encoding.
Do you have any data showing which immediate values are being used when building some larger projects? Inside loops I'd imagine this to be a very small offset since the base register would be modified and for stack loads/stores the most common offsets would also be quite small (and there is push/pop which replaces lots of the ldp/stp you see in AArch64 function prologs/epilogs).
I am also not sure this extension needs compressed opcodes - is it really that common? I imagine you have a compiler prototype that can show how often it is being used?
For the compressed instructions we would end up using essentially all the remaining encodings freed up by disabling Zcf which seems quite a large impact for what I would expect to be a rather small code size improvement.
The text was updated successfully, but these errors were encountered: