Description
What is the feature you'd like to have?
Improved handling of expressions involving 32-bit shifts on 64-bit values, particularly in cases like (_strlen(...) << 32)
or where offsets are applied using s >> 32
after being derived from a << 32
shift. These patterns currently show up in a way that obscures the underlying logic, especially around string buffer manipulation and offset calculation. Ideally, these would be either decompiled more clearly or expressed in IL in a way that better reflects the original intent.
Is your feature request related to a problem?
Yes — expressions like (strlen(...) << 32)
and access patterns such as *(buffer + (offset s>> 32))
currently produce disassembly that’s hard to follow and may give the impression of nonsensical or malformed logic. These show up in real-world binaries (e.g. in CFURL or preset handling routines) and can hinder analysis by obscuring control and data flow, especially when shifts are used to build 64-bit values across instructions.
Are any alternative solutions acceptable?
Even partial improvements, such as recognizing common shift idioms and providing annotations, simplifications, or heuristics that improve readability, would be useful. It doesn’t need to be perfect semantic recovery — clearer visual cues or better IL-level expression of intent would already be a big step forward.
Additional Information:
The following was originally shared in #5489:
- This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations
Originally posted by @0xdevalias in #5489 (comment)
Also, curious (and may be irrelevant now since the above likely eliminates this level of detail/view anyway), but have there been any additions/changes/improvements that would address/simplify this aside I mentioned:
This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations
I'm not sure if that specific function is in the demo binary I uploaded though.. will check.
Edit: Unfortunately it seems to be in a different version of the binary, I can probably upload that one if I knew where that service was (I couldn't find it when I looked earlier)
To demonstrate it in a potentially more useful form than the original screenshot though; here is a copy/paste out of Binary Ninja:
_strlen
/etc results being shifted005cc360 if (_FSFindFolder(0xffff8005, 'pusa', 0, &foundRef, zx.o(0)) == 0) 005cc46a int64_t foundRefUrl = _CFURLCreateFromFSRef(0, &foundRef) // Creates a URL from a given directory or file. 005cc484 // CFURLGetFileSystemRepresentation( 005cc484 // url: CFURL!, 005cc484 // resolveAgainstBase: Bool, 005cc484 // buffer: UnsafeMutablePointer<UInt8>!, 005cc484 // maxBufLen: CFIndex 005cc484 // ) -> Bool 005cc484 // 005cc484 // Fills a buffer with the file system's native string representation of a given URL's path. 005cc484 _CFURLGetFileSystemRepresentation(foundRefUrl, 0, thisSerumDSP->presetFilename, 512) 005cc48c // void CFRelease(CFTypeRef cf); 005cc48c // 005cc48c // Releases a Core Foundation object. 005cc48c _CFRelease(foundRefUrl) 005cc491 char* presetFilename_4 = thisSerumDSP->presetFilename 🐛005cc49c int64_t presetFilename_4Length = _strlen(presetFilename_4) 🐛005cc4ab *(presetFilename_4 + presetFilename_4Length) = '/com.xfe' 🐛005cc4b9 *(presetFilename_4 + presetFilename_4Length + 8) = 'rrecords' 🐛005cc4c8 *(presetFilename_4 + presetFilename_4Length + 16) = '.serum/u' 🐛005cc4d7 *(presetFilename_4 + presetFilename_4Length + 24) = 'ser.dat' 005cc4eb int64_t userDatFileHandle = _fopen(thisSerumDSP->presetFilename, "wb") 005cc4f6 _rewind(userDatFileHandle) 005cc503 __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16) 005cc503 005cc522 if (_fwrite(&tildeStr, 0x10, 1, userDatFileHandle) u> 0xe) 005cc531 _fclose(userDatFileHandle) 005cc522 else 005cc527 SerumDSP::InitStuff(thisSerumDSP) 005cc360 else 005cc376 _strcpy(thisSerumDSP->presetFilename, thisSerumDSP->_maybeSerumPresetsPath) 005cc37b char* presetFilename_2 = thisSerumDSP->presetFilename 005cc386 int64_t presetFilename_2Length = _strlen(presetFilename_2) 🐛005cc395 *(presetFilename_2 + presetFilename_2Length) = '/System/' 🐛005cc3a3 *(presetFilename_2 + presetFilename_2Length + 8) = 'utermmat' 🐛005cc3a8 presetFilename_2[presetFilename_2Length + 16] = '\x00' 005cc3ad char* presetFilename_3 = thisSerumDSP->presetFilename 🐛005cc3bd int64_t presetFilenameLength = _strlen(presetFilename_3) << 32 // A << 32 bit shift on a 64-bit integer moves the original 32-bit length value from the lower 32 bits to the upper 32 bits of the 64-bit integer 🐛005cc3d2 presetFilename_3[(-30064771072 + presetFilenameLength) s>> 32] = 's' // -30064771072 >> 32 == -7 🐛005cc3d6 thisSerumDSP->presetFilename[(-17179869184 + presetFilenameLength) s>> 32] = '.' // -17179869184 >> 32 == -4 🐛005cc3f3 thisSerumDSP->presetFilename[(-12884901888 + presetFilenameLength) s>> 32] = 'd' // -12884901888 >> 32 == -3 005cc41f // thisSerumDSP->presetFilename = /System/utermmat 005cc41f // 005cc41f // After replacing length(16)-7 with s: /System/usermmat 005cc41f // After replacing length(16)-4 with .: /System/user.mat 005cc41f // After replacing length(16)-3 with .: /System/user.dat 005cc41f int64_t rax_40 = _fopen(thisSerumDSP->presetFilename, "wb") 005cc42f __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16) 005cc42f 005cc44e if (_fwrite(&tildeStr, 16, 1, rax_40) u<= 14) 005cc457 SerumDSP::InitStuff(thisSerumDSP) 005cc457 005cc531 _fclose(rax_40)
Originally posted by @0xdevalias in #5489 (comment)
Yeah I'm not sure if any updates I made would specifically help with the shifts. If I had an example binary I would be more than willing to spend some time looking into it.
Originally posted by @bpotchik in #5489 (comment)
Example binary uploaded as clever river explores happily
, and here is the un-modified snippet of code from a freshly loaded version of that binary:
00622cdb else
00622cef _strcpy(__dst: arg1[0x72a4], __src: arg1[0x72a5])
00622cf4 char* __s1 = arg1[0x72a4]
00622d10 int32_t var_288
00622d10 __builtin_memcpy(dest: &var_288, src: "\x10\x00\x00\x00\x58\x2b\x00\x09\x0f\x19\x10\x51\x0a\xf4\xe4\xf0\xee\xe9\xe4\xf2\x87\x00\x00\x00", n: 0x18)
00622d25 __builtin_strncpy(dest: &__s2, src: "/System/utermmat", n: 0x11)
00622d38 _strncat(__s1, &__s2, __n: 0x10)
00622d3d char* __s_1 = arg1[0x72a4]
00622d4c uint64_t rax_44 = _strlen(__s: __s_1) << 0x20
00622d61 __s_1[(-0x700000000 + rax_44) s>> 0x20] = 0x73
00622d65 arg1[0x72a4][(-0x400000000 + rax_44) s>> 0x20] = 0x2e
00622d81 arg1[0x72a4][(-0x300000000 + rax_44) s>> 0x20] = 0x64
00622dab FILE* __stream_1 = _fopen(__filename: arg1[0x72a4], __mode: &data_6f7531)
00622dbb __builtin_strncpy(dest: &var_258, src: "~~~~~~~~~~~~~~~~", n: 0x10)