Skip to content

Security/safe pickle loader #460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sanowl
Copy link

@sanowl sanowl commented Jul 2, 2025

Overview

This PR eliminates the only medium-severity Bandit findings in the code-base by introducing a restricted-capability pickle loader and replacing import-time pickle.load/loads calls.

Motivation

The repo ships .pickle files that are deserialized at import time.
Unrestricted pickle.load can execute arbitrary code if the file is ever tampered with (CWE-502). While the pickles are trusted in normal use, hardening this path removes a potential supply-chain attack vector.

Key Changes

  1. src/alphafold3/common/safe_pickle.py

    • Implements safe_load using a custom Unpickler that only allows harmless built-in container / primitive types (dict, list, tuple, str, int, etc.).
    • Any attempt to unpickle other globals raises pickle.UnpicklingError.
  2. Replaced unsafe loads

    • constants/chemical_component_sets.py
    • constants/chemical_components.py
    • constants/converters/chemical_component_sets_gen.py
      These modules now read their pickle files via safe_pickle.safe_load.
  3. Security scan result

    $ bandit -r src -ll
    SEVERITY.MEDIUM: 0
    SEVERITY.HIGH:   0
    

    No medium or high findings remain; low-severity items are stylistic.

Testing

  • python -m py_compile on all modified files => passes.
  • Imported the two constants modules with the generated .pickle files present – loads cleanly.
  • Verified safe_pickle.safe_load:
    • Correctly loads a normal dict/tuple pickle.
    • Blocks a pickle containing datetime.datetime with UnpicklingError.
  • Re-ran Bandit static analysis (see snippet above).

Impact

  • No functional change to model inference or data pipeline; only safer deserialization of bundled constant data.
  • Downstream users do not need to modify their code.

Checklist

  • New code is lint-clean and typed.
  • Bandit scan shows no medium/high findings.
  • Added/updated tests or manual verification.
  • Commit message follows conventional-commits style.

sanowl added 2 commits July 2, 2025 19:17
…nsafe pickle.load calls\n\nAdds safe_pickle.safe_load that allows only primitive/container builtins during unpickling, preventing arbitrary code execution if pickled constants were ever tampered with.\n\nModules updated:\n* constants/chemical_component_sets.py\n* constants/chemical_components.py\n* constants/converters/chemical_component_sets_gen.py\n\nBandit now reports zero medium/high findings.
Copy link
Collaborator

@Augustin-Zidek Augustin-Zidek Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be included, same for all of the other pyc files below. (I will add the missing .gitignore file for this project.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants