A parallel checksum utility using a Merkle tree, designed for huge files.
With the latest PCIe 4.0 and 5.0 SSDs, a single processor thread is insufficient to fully utilize their bandwidth. This utility leverages multiple threads to compute a file’s checksum in parallel using a Merkle tree structure, enabling efficient checksum calculations for large files in a reasonable time.
Usage: mtsum [--help] [--version] [-p processors] [-a algorithm] path
Positional arguments:
path path to input file [required]
Optional arguments:
-h, --help shows help message and exits
-v, --version prints version information and exits
-p number of processors to use [nargs=0..1] [default: 8]
-a hashing algorithm to use, supported algorithms are md5, sha1, sha256, sha384, sha512 [nargs=0..1] [default: "sha256"]
-g output the merkle tree as DOT graph
Misc options (detailed usage):
--benchmark enable benchmark
--verbose enable verbose output
~4.2x faster than Get-FileHash on a ~183 GiB file.
- OS: Windows 11 Pro 24H2
- CPU: Intel i9-13900KF
- RAM: 64GB Dual-Channel DDR4-3200
- SSD: WD Black SN850X 4TB PCIe 4.0 (Max Seq. Read: 7,300 MB/s)
PS > Measure-Command { mtsum -v ... | Out-Default }
Algorithm: sha256
Number of processors: 8
File size: 196502093824 bytes
c5750c570206464ed6d9b2ef8d290a42fcb8121f97a803c6510ecca5b43ee699
32.99 s (5.96 GB/s)
...
TotalSeconds : 33.1166517
...
PS > Measure-Command { Get-FileHash ... | Out-Default }
...
TotalSeconds : 138.0812053
4.4x faster than sha256sum on a ~165 GiB file.
- OS: Debian GNU/Linux trixie/sid
- CPU: AMD EPYC 7203P
- RAM: 512GB Eight-Channel DDR4-3200
- SSD: Micron 7450 Pro 7.68TB U.3 Enterprise SSD (Max Seq. Read: 6,800 MB/s)
$ time ./mtsum -v ...
Algorithm: sha256
Number of processors: 8
File size: 177652487485 bytes
26d9ced146e549ecb6848d421a9f4f483206c57a9428d9232af7984db84c4f3b
27.62 s (6.43 GB/s)
real 0m27.634s
user 1m49.710s
sys 0m3.348s
$ time sha256sum ...
5ce5b397d323cde668b77c08e17c48f6a5b6972671aa401d33e91faf1e366048 ...
real 2m2.146s
user 1m44.980s
sys 0m17.137s
- CMake 3.20 or higher, but lower than 4.0. CMake 4.0 is currently causing issue with one of the libraries
- make or ninja
- Any C++ compiler that supports C++20 or higher
- (Optional) vcpkg
The following libraries are required to build the project:
- Taskflow >= 3.9.0
- LLFIO
- OpenSSL
- argparse Note: If you have vcpkg installed. vcpkg will automatically download and install the dependencies for you.
- Run
cmake --preset release-ninjaorcmake --preset release-maketo generate the build files. - Run
cd cmake-build-release && makeorcd cmake-build-release && ninjain to build the project.
Add -DMTSUM_STATIC=ON to the cmake command to generate build files for static linking.
Add -DMTSUM_VCPKG=ON to the cmake command to use vcpkg for dependency management.
Use preset debug-ninja or debug-make instead.
This project is developed under the direction of Dr. Jaroslaw Zola.