Skip to content

Decompiler: Wrong consumption mask computation for zero-bit left shift if varnode's size > sizeof(uintb) #8705

@fkil

Description

@fkil

Describe the bug
On some custom language definition I've encountered the deadcode analysis eliminating some relevant code. I've traced it down to it being related to a left shift of 0 if the varnode is bigger than 8.

Specifically, in ActionDeadCode::propagateConsumed the following line of code is used to compute the new mask for CPUI_INT_LEFT:

a = (outc >> sa) ^ ( (~((uintb)0)) << (8*sizeof(uintb)-sa));

a is the resulting consumption mask for the left side of the shift, outc the previous consumption mask and sa the constant value on the right-hand side. For sa==0, this will perform a left-shift of 64-bit, which is undefined behavior. For x64, this usually will be equivalent to a NOP, since the bit count is masked. As a consequence, the result a will afterwards equal to outc ^ (~((uintb)0)), i.e. ~outc, instead of outc.

A fix would be to specifically check for left shifts of 0 bits. I plan to submit a PR as well.

To Reproduce
I couldn't find an instruction in the existing language definitions, which performs variable left shifts on varnodes (relevant instructions are only implemented via pcodeops). To reproduce on a real binary, I implemented a version of the x64 pshuflw instruction, which currently is also only implemented via a pcodeop.

  1. In the Ghidra install dir, in Ghidra/Processors/x86/data/languages/ia.sinc replace the definition of pshuflwwith the following:
:PSHUFLW        XmmReg1, XmmReg2_m128, imm8     is vexMode=0 & $(PRE_F2) & byte=0x0F; byte=0x70; XmmReg2_m128 & XmmReg1 ...; imm8 { 

local s1 = (XmmReg2_m128 >> imm8[0,2] * 16);
local s2 = (XmmReg2_m128 >> imm8[2,2] * 16);
local s3 = (XmmReg2_m128 >> imm8[4,2] * 16);
local s4 = (XmmReg2_m128 >> imm8[6,2] * 16);

XmmReg1 = zext(s1[0,16]) <<  0 | 
          zext(s2[0,16]) << 16 |
          zext(s3[0,16]) << 32 |
          zext(s4[0,16]) << 48 |
          zext(XmmReg2_m128[64,64]);

}
  1. Compile the following the test program with gcc -O3 -fno-stack-protector
#include <stdint.h>
#include <stdio.h>
#include <string.h>

void shuffle_words4(char *w4) {
	char tmp[8];
	for (int i = 0; i < 4; i++) {
		tmp[i * 2] = w4[6 - i * 2];
		tmp[i * 2 + 1] = w4[7 - i * 2];
	}
	memcpy(w4, tmp, 8);
}

int main(int argc, char** argv) {
	if(argc < 2) return 1;
	if(strlen(argv[1]) != 8) return 1;
	shuffle_words4(argv[1]);
	printf("%s", argv[1]);
}
  1. Decompile the function shuffle_words4. The resulting code is:
void shuffle_words4(ulong *param_1)
{
  ulong uVar1;
  
  uVar1 = *param_1;
  *param_1 = (uVar1 >> 0x20 & 0xffff) << 0x10 | (uVar1 >> 0x10 & 0xffff) << 0x20 | uVar1 << 0x30;
  return;
}

Expected behavior
I would expect the expression to also include the expression uVar1 >> 0x30. However, deadcode
elimination currently removes this part.

Environment:

  • OS: Arch Linux
  • Java Version: openjdk 21.0.9 2025-10-21
  • Ghidra Version: 11.4.2
  • Ghidra Origin: official GitHub distro

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions