Skip to content

Conversation

@tamilari
Copy link
Contributor

This series of commits significantly refactors and improves our checksum handling. Key changes include:

  • Centralization of checksum logic: Moving various checksum parsing, calculation, and verification functionalities into common utility functions. This enhances maintainability and reduces code duplication
  • "Best Digest" approach for hash checks: Implementing a standardized approach to use only the best matching digest for hash comparisons across various components. This ensures strong integrity checks by focusing on the most reliable checksums and prevents mismatches that could arise from differing lower-priority algorithms.

The `to_hashlib` method now operates directly on the `ChecksumAlgo` enum instance,
simplifying its usage and aligning better with object-oriented principles.

Signed-off-by: Tamino Larisch <[email protected]>
Move `ChecksumMismatchError` from `debsbom/merge/merge.py` to
`debsbom/util/checksum.py` to centralize checksum-related logic and
enhance reusability across the codebase.

Signed-off-by: Tamino Larisch <[email protected]>
Introduced `verify_best_matching_digest` to compare two sets of digests
and `check_hash_from_path` to verify a file's hash against provided
checksums. The `best_matching_digest` function was also renamed to
`_best_matching_digest`, signifying its new role as an internal helper
not intended for direct external use.

Signed-off-by: Tamino Larisch <[email protected]>
Replaced inline hash comparison logic in both `CdxSbomMerger` and
`SpdxSbomMerger` with a call to the `verify_best_matching_digest`
utility function. This improves readability and maintainability by
abstracting complex logic. Only checking the best matching digest
prevents mismatches due to differing lower-priority checksums while
still ensuring strong integrity checks on the most reliable digest.

Signed-off-by: Tamino Larisch <[email protected]>
Replace inline and repetitive checksum calculation logic with a new,
dedicated `calculate_checksums` utility function. This new function
processes input data (file paths or raw bytes) in a single pass,
updating all required hash algorithms concurrently. This reduces I/O
operations and improves performance compared to the previous method of
reading the stream multiple times for each checksum algorithm

Signed-off-by: Tamino Larisch <[email protected]>
RemoteFile now uses our intern mapping of ChecksumAlgo to str, as it is
in most places where we have checksums. Apart from removing the
ambiguitiy of what hash algorithm 'hash' uses, it also allows the use of
the new checksum verification methods to compare checkusms.
This change requires the use of a frozenset in PackageDownloader to
ensure that files are identified by their complete set of checksums.

Signed-off-by: Tamino Larisch <[email protected]>
Move checksum parsing and verification logic into `util/checksum.py`.
This introduces new utilities for extracting checksums from Dsc files
and debian package entries, as well as validating files linked in a Dsc
file.

Signed-off-by: Tamino Larisch <[email protected]>
@fmoessbauer
Copy link
Member

@Urist-McGit from my perspective the changes are fine. But as the changeset is quite big, I would be happy if you could review as well.

@Urist-McGit
Copy link
Collaborator

LGTM too, even if it means I have to do quite a big rebase on the plugin work

@Urist-McGit Urist-McGit merged commit 31ac2eb into siemens:main Nov 25, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants