-
Notifications
You must be signed in to change notification settings - Fork 6.9k
Description
Describe the bug
On some custom language definition I've encountered the deadcode analysis eliminating some relevant code. I've traced it down to it being related to a left shift of 0 if the varnode is bigger than 8.
Specifically, in ActionDeadCode::propagateConsumed the following line of code is used to compute the new mask for CPUI_INT_LEFT:
a = (outc >> sa) ^ ( (~((uintb)0)) << (8*sizeof(uintb)-sa));
a is the resulting consumption mask for the left side of the shift, outc the previous consumption mask and sa the constant value on the right-hand side. For sa==0, this will perform a left-shift of 64-bit, which is undefined behavior. For x64, this usually will be equivalent to a NOP, since the bit count is masked. As a consequence, the result a will afterwards equal to outc ^ (~((uintb)0)), i.e. ~outc, instead of outc.
A fix would be to specifically check for left shifts of 0 bits. I plan to submit a PR as well.
To Reproduce
I couldn't find an instruction in the existing language definitions, which performs variable left shifts on varnodes (relevant instructions are only implemented via pcodeops). To reproduce on a real binary, I implemented a version of the x64 pshuflw instruction, which currently is also only implemented via a pcodeop.
- In the Ghidra install dir, in
Ghidra/Processors/x86/data/languages/ia.sincreplace the definition ofpshuflwwith the following:
:PSHUFLW XmmReg1, XmmReg2_m128, imm8 is vexMode=0 & $(PRE_F2) & byte=0x0F; byte=0x70; XmmReg2_m128 & XmmReg1 ...; imm8 {
local s1 = (XmmReg2_m128 >> imm8[0,2] * 16);
local s2 = (XmmReg2_m128 >> imm8[2,2] * 16);
local s3 = (XmmReg2_m128 >> imm8[4,2] * 16);
local s4 = (XmmReg2_m128 >> imm8[6,2] * 16);
XmmReg1 = zext(s1[0,16]) << 0 |
zext(s2[0,16]) << 16 |
zext(s3[0,16]) << 32 |
zext(s4[0,16]) << 48 |
zext(XmmReg2_m128[64,64]);
}
- Compile the following the test program with
gcc -O3 -fno-stack-protector
#include <stdint.h>
#include <stdio.h>
#include <string.h>
void shuffle_words4(char *w4) {
char tmp[8];
for (int i = 0; i < 4; i++) {
tmp[i * 2] = w4[6 - i * 2];
tmp[i * 2 + 1] = w4[7 - i * 2];
}
memcpy(w4, tmp, 8);
}
int main(int argc, char** argv) {
if(argc < 2) return 1;
if(strlen(argv[1]) != 8) return 1;
shuffle_words4(argv[1]);
printf("%s", argv[1]);
}- Decompile the function
shuffle_words4. The resulting code is:
void shuffle_words4(ulong *param_1)
{
ulong uVar1;
uVar1 = *param_1;
*param_1 = (uVar1 >> 0x20 & 0xffff) << 0x10 | (uVar1 >> 0x10 & 0xffff) << 0x20 | uVar1 << 0x30;
return;
}Expected behavior
I would expect the expression to also include the expression uVar1 >> 0x30. However, deadcode
elimination currently removes this part.
Environment:
- OS: Arch Linux
- Java Version: openjdk 21.0.9 2025-10-21
- Ghidra Version: 11.4.2
- Ghidra Origin: official GitHub distro