Skip to content

laurenlabell/checksum_finder

Repository files navigation

Checksum Finder

Trying to reverse engineer a checksum? We can help!

Reverse engineering unknown binary message formats is an important part of security research. Error detecting codes such as checksums and Cyclic Redundancy Check codes (CRCs) are commonly added to messages as a guard against corrupt or untrusted input. Before an analyst can manufacture input for software which uses checksums they must discover the algorithm to calculate a valid checksum. To address this need, we have developed a program synthesis based approach for detecting and reverse-engineering checksum algorithms automatically.

Our approach takes a small set of binary messages as input and automatically returns a Python implementation of the checksum algorithm if one can be found. Our approach first performs a search over the message space to identify the location of the checksum and then uses program synthesis to identify the operations performed on the message to compute the checksum. We return to the user runnable code to both calculate a checksum from a message and to validate a message according to the checksum algorithm. We generate unit tests, allowing the user to validate the synthesized checksum algorithm is correct with regard to the input messages.

For all the details read our paper! https://dl.acm.org/doi/10.1145/3411506.3417599

Usage: python3 sumeng_module.py width file Where width is the bitwidth of the checksum field and file is a file with messages in ascii hex format, 1 message per line.

Example: python3 sumeng_module.py 8 test1.txt

Limitations:

  • sumeng_module.py expects big endian data.
  • Our tool expects that the width of the checksum and the stride through the data are both either 8-bit or 16-bit. In the future we expect to support 16-bit checksums with 8-bit data strides.

Example Run:

python3 sumeng_module.py 8 test1.txt 
#	0 	entropy: 2.585	perc_used: 1.0	start: 0	end: 0	checksum_index: -1	fold_op: <built-in function add>	final_op: <built-in function xor>	magic: 0x55
#	1 	entropy: 2.585	perc_used: 0.92	start: 0	end: -1	checksum_index: -1	fold_op: <built-in function add>	final_op: <built-in function xor>	magic: 0x55
# Solution number to gen code for? :0
#  start: 0 end: 0 check: -1 foldOp: <built-in function add> finalOp: <built-in function xor> magicValue: 0x55
# ================================================================================
# Generated Code
# --------------------------------------------------------------------------------



import operator

def twosComp(n):
    return -n

def onesComp(n1, n2):
    mod = 1 << 8
    result = n1 + n2
    return result if result < mod else (result + 1) % mod  

def pad(xs,w):
	n = len(xs)
	target_n = (-(-n//w)) * w
	delta = target_n - n
	xs_padded = xs+[0]*delta
	return xs_padded

def chunk(xs,w):
	xs_chunked = [xs[i:i+w] for i in range(0,len(xs),w)]
	return xs_chunked

def to_int(x):
	return int.from_bytes(bytes(x),'big')


def preprocess(hex_str,w):
	xs = [x for x in bytes.fromhex(hex_str)]
	xs_padded = pad(xs,w)
	xs_chunked = chunk(xs_padded,w)
	xs_ints = [to_int(x) for x in xs_chunked]
	return xs_ints


def calculate_checksum(payload):
	magicValue = 0x55
	mask = 0xFF

	checksum = 0
	for element in payload:
		checksum = operator.add(checksum,element)
	checksum =  operator.xor(checksum,magicValue)
	return checksum & mask

def validate_message(rawmsg):
	msgStart = 0
	msgEnd = 0
	checksumPos = -1 
	width = 1

	msg = preprocess(rawmsg,width)
	payload = msg[msgStart:]
	checksum = msg[checksumPos]
	payload[checksumPos] = 0

	return calculate_checksum(payload) == checksum

# ================================================================================
# Unit Tests
# --------------------------------------------------------------------------------

print(validate_message('806FA30102B00818'),'806FA30102B00818')
print(validate_message('806FA30112800878'),'806FA30112800878')
print(validate_message('1003A30001004006729E99940012120B'),'1003A30001004006729E99940012120B')
print(validate_message('1003A30001003007709C98940012121F'),'1003A30001003007709C98940012121F')
print(validate_message('1003A30001003806739C9B9400121202'),'1003A30001003806739C9B9400121202')
print(validate_message('806FA30200800041'),'806FA30200800041')

# --------------------------------------------------------------------------------
# End Generated Code
# --------------------------------------------------------------------------------

To run against the corpus_summary_full.csv: python3 test_corpus.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published