Skip to content

Improve handling of 32-bit shifts on 64-bit values in analysis and display #7044

Open
@0xdevalias

Description

@0xdevalias

What is the feature you'd like to have?

Improved handling of expressions involving 32-bit shifts on 64-bit values, particularly in cases like (_strlen(...) << 32) or where offsets are applied using s >> 32 after being derived from a << 32 shift. These patterns currently show up in a way that obscures the underlying logic, especially around string buffer manipulation and offset calculation. Ideally, these would be either decompiled more clearly or expressed in IL in a way that better reflects the original intent.

Is your feature request related to a problem?

Yes — expressions like (strlen(...) << 32) and access patterns such as *(buffer + (offset s>> 32)) currently produce disassembly that’s hard to follow and may give the impression of nonsensical or malformed logic. These show up in real-world binaries (e.g. in CFURL or preset handling routines) and can hinder analysis by obscuring control and data flow, especially when shifts are used to build 64-bit values across instructions.

Are any alternative solutions acceptable?

Even partial improvements, such as recognizing common shift idioms and providing annotations, simplifications, or heuristics that improve readability, would be useful. It doesn’t need to be perfect semantic recovery — clearer visual cues or better IL-level expression of intent would already be a big step forward.

Additional Information:

The following was originally shared in #5489:

image

  • This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations

Originally posted by @0xdevalias in #5489 (comment)

Also, curious (and may be irrelevant now since the above likely eliminates this level of detail/view anyway), but have there been any additions/changes/improvements that would address/simplify this aside I mentioned:

This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations

I'm not sure if that specific function is in the demo binary I uploaded though.. will check.

Edit: Unfortunately it seems to be in a different version of the binary, I can probably upload that one if I knew where that service was (I couldn't find it when I looked earlier)

To demonstrate it in a potentially more useful form than the original screenshot though; here is a copy/paste out of Binary Ninja:

_strlen/etc results being shifted
005cc360   if (_FSFindFolder(0xffff8005, 'pusa', 0, &foundRef, zx.o(0)) == 0)
005cc46a       int64_t foundRefUrl = _CFURLCreateFromFSRef(0, &foundRef)  // Creates a URL from a given directory or file.
005cc484       // CFURLGetFileSystemRepresentation(
005cc484       //   url: CFURL!,
005cc484       //   resolveAgainstBase: Bool,
005cc484       //   buffer: UnsafeMutablePointer<UInt8>!,
005cc484       //   maxBufLen: CFIndex
005cc484       // ) -> Bool
005cc484       // 
005cc484       // Fills a buffer with the file system's native string representation of a given URL's path.
005cc484       _CFURLGetFileSystemRepresentation(foundRefUrl, 0, thisSerumDSP->presetFilename, 512)
005cc48c       // void CFRelease(CFTypeRef cf);
005cc48c       // 
005cc48c       // Releases a Core Foundation object.
005cc48c       _CFRelease(foundRefUrl)
005cc491       char* presetFilename_4 = thisSerumDSP->presetFilename
🐛005cc49c      int64_t presetFilename_4Length = _strlen(presetFilename_4)
🐛005cc4ab      *(presetFilename_4 + presetFilename_4Length) = '/com.xfe'
🐛005cc4b9      *(presetFilename_4 + presetFilename_4Length + 8) = 'rrecords'
🐛005cc4c8      *(presetFilename_4 + presetFilename_4Length + 16) = '.serum/u'
🐛005cc4d7      *(presetFilename_4 + presetFilename_4Length + 24) = 'ser.dat'
005cc4eb       int64_t userDatFileHandle = _fopen(thisSerumDSP->presetFilename, "wb")
005cc4f6       _rewind(userDatFileHandle)
005cc503       __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16)
005cc503       
005cc522       if (_fwrite(&tildeStr, 0x10, 1, userDatFileHandle) u> 0xe)
005cc531           _fclose(userDatFileHandle)
005cc522       else
005cc527           SerumDSP::InitStuff(thisSerumDSP)
005cc360   else
005cc376       _strcpy(thisSerumDSP->presetFilename, thisSerumDSP->_maybeSerumPresetsPath)
005cc37b       char* presetFilename_2 = thisSerumDSP->presetFilename
005cc386       int64_t presetFilename_2Length = _strlen(presetFilename_2)
🐛005cc395      *(presetFilename_2 + presetFilename_2Length) = '/System/'
🐛005cc3a3      *(presetFilename_2 + presetFilename_2Length + 8) = 'utermmat'
🐛005cc3a8      presetFilename_2[presetFilename_2Length + 16] = '\x00'
005cc3ad       char* presetFilename_3 = thisSerumDSP->presetFilename
🐛005cc3bd      int64_t presetFilenameLength = _strlen(presetFilename_3) << 32  // A << 32 bit shift on a 64-bit integer moves the original 32-bit length value from the lower 32 bits to the upper 32 bits of the 64-bit integer
🐛005cc3d2      presetFilename_3[(-30064771072 + presetFilenameLength) s>> 32] = 's'  // -30064771072 >> 32 == -7
🐛005cc3d6      thisSerumDSP->presetFilename[(-17179869184 + presetFilenameLength) s>> 32] = '.'  // -17179869184 >> 32 == -4
🐛005cc3f3      thisSerumDSP->presetFilename[(-12884901888 + presetFilenameLength) s>> 32] = 'd'  // -12884901888 >> 32 == -3
005cc41f       // thisSerumDSP->presetFilename = /System/utermmat
005cc41f       // 
005cc41f       // After replacing length(16)-7 with s: /System/usermmat
005cc41f       // After replacing length(16)-4 with .: /System/user.mat
005cc41f       // After replacing length(16)-3 with .: /System/user.dat
005cc41f       int64_t rax_40 = _fopen(thisSerumDSP->presetFilename, "wb")
005cc42f       __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16)
005cc42f       
005cc44e       if (_fwrite(&tildeStr, 16, 1, rax_40) u<= 14)
005cc457           SerumDSP::InitStuff(thisSerumDSP)
005cc457       
005cc531       _fclose(rax_40)

Originally posted by @0xdevalias in #5489 (comment)

Yeah I'm not sure if any updates I made would specifically help with the shifts. If I had an example binary I would be more than willing to spend some time looking into it.

Originally posted by @bpotchik in #5489 (comment)


Example binary uploaded as clever river explores happily, and here is the un-modified snippet of code from a freshly loaded version of that binary:

00622cdb  else
00622cef      _strcpy(__dst: arg1[0x72a4], __src: arg1[0x72a5])
00622cf4      char* __s1 = arg1[0x72a4]
00622d10      int32_t var_288
00622d10      __builtin_memcpy(dest: &var_288, src: "\x10\x00\x00\x00\x58\x2b\x00\x09\x0f\x19\x10\x51\x0a\xf4\xe4\xf0\xee\xe9\xe4\xf2\x87\x00\x00\x00", n: 0x18)
00622d25      __builtin_strncpy(dest: &__s2, src: "/System/utermmat", n: 0x11)
00622d38      _strncat(__s1, &__s2, __n: 0x10)
00622d3d      char* __s_1 = arg1[0x72a4]
00622d4c      uint64_t rax_44 = _strlen(__s: __s_1) << 0x20
00622d61      __s_1[(-0x700000000 + rax_44) s>> 0x20] = 0x73
00622d65      arg1[0x72a4][(-0x400000000 + rax_44) s>> 0x20] = 0x2e
00622d81      arg1[0x72a4][(-0x300000000 + rax_44) s>> 0x20] = 0x64
00622dab      FILE* __stream_1 = _fopen(__filename: arg1[0x72a4], __mode: &data_6f7531)
00622dbb      __builtin_strncpy(dest: &var_258, src: "~~~~~~~~~~~~~~~~", n: 0x10)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions