This repository implements analytical conversion from score-based diffusion to flow matching for the Boltz-2 protein structure prediction model. This approach enables faster sampling without requiring any retraining.
- No retraining required, works with existing pretrained checkpoints
- Same architecture, uses the existing diffusion module
- Pure analytical transformation from score to velocity
Score-Based Diffusion Original Boltz-2
- Uses stochastic differential equations SDEs
- Each step involves random noise injection
- Slower due to stochastic nature
Flow Matching This Implementation
- Uses ordinary differential equations ODEs
- Deterministic integration no random noise
- Faster due to deterministic nature
The key insight is that both approaches parameterize the same underlying noise:
Score Model: x_t = x_0 + σ·ε → ε = (x_t - x_0)/σ
Flow Model: x_t = (1-t)·x_0 + t·ε → v = ε - x_0
Where:
x_t: Noisy coordinates at time tx_0: Clean coordinates (ground truth)ε: Noise vectorσ: Noise levelv: Velocity field for flow matching
- Fewer steps required for integration
- Deterministic integration avoids random noise generation
- Heun integration RK2 improves efficiency over simple Euler
- No architectural changes reduce overhead
The implementation uses the exact same DiffusionModule as the original Boltz-2:
# Same architecture as diffusionv2.py
self.score_model = DiffusionModule(**score_model_args)
# Only difference: analytical conversion layer
self.converter = ScoreToVelocityConverter(
conversion_method='noise_based' # Most accurate method
)Three analytical conversion methods are implemented:
-
noise_based(RECOMMENDED): Most accurateepsilon = (x_t - x_0_pred) / sigma velocity = epsilon - x_0_pred
-
pflow: Probability flow ODEvelocity = 0.5 * (x_0_pred - x_t)
-
simple: Direct geometric conversionx_1_est = (x_t - (1-t)*x_0_pred) / t velocity = x_1_est - x_0_pred
Uses Heun's method (RK2) for ODE integration:
# First velocity evaluation
v1 = velocity_network_forward(x, t_curr)
# Euler step
x_euler = x + dt * v1
# Second velocity evaluation
v2 = velocity_network_forward(x_euler, t_next)
# Heun update (average of velocities)
x_new = x + 0.5 * dt * (v1 + v2)conda env create -f environment.yml
conda activate boltz
# optional editable install for src package
pip install -e .# Run flow matching predictions
python run_boltz_flow_matching.py
# This will:
# 1 Load Boltz 2 checkpoint at ~/.boltz/boltz2_conf.ckpt
# 2 Convert to flow matching format
# 3 Run predictions on hackathon data
# 4 Generate resultsfrom run_boltz_flow_matching import BoltzFlowMatchingRunner
runner = BoltzFlowMatchingRunner(
flow_steps=20, # ODE integration steps
score_steps=200, # Original SDE steps (for comparison)
diffusion_samples=1, # Number of samples per protein
device='cuda' # Device to use
)
results = runner.run_predictions(max_proteins=5)from boltz.model.models.boltz2 import Boltz2
# Load converted checkpoint
model = Boltz2.load_from_checkpoint(
"flow_matching_boltz2.ckpt",
map_location='cuda'
)
# The model automatically uses FlowMatchingDiffusion
# when use_flow_matching=True in hyperparameters- Main runner script run_boltz_flow_matching.py
- Hackathon prediction script hackathon/predict_hackathon.py and helper API in hackathon/hackathon_api.py
- Training entrypoint scripts/train/train.py with configs in scripts/train/configs
- MSA generation scripts/generate_local_msa.py
- Evaluation helpers under scripts/eval
├── run_boltz_flow_matching.py # Main runner script
├── src/boltz/model/modules/
│ └── diffusionv3_flow_matching.py # Flow matching implementation
└── src/boltz/model/models/
└── boltz2.py # Modified to support flow matching
BoltzFlowMatchingRunner: Main orchestratorFlowMatchingDiffusion: Flow matching moduleScoreToVelocityConverter: Analytical conversionBoltz2: Modified model with flow matching support
The flow matching is integrated into Boltz-2 through:
-
Conditional Import:
try: from boltz.model.modules.diffusionv3_flow_matching import AtomDiffusion as FlowMatchingDiffusion except ImportError: FlowMatchingDiffusion = None
-
Hyperparameter Control:
if use_flow_matching and FLOW_MATCHING_AVAILABLE: self.structure_module = FlowMatchingDiffusion(...) else: self.structure_module = AtomDiffusion(...)
-
Checkpoint Conversion:
hparams['use_flow_matching'] = True hparams['flow_conversion_method'] = 'noise_based'
The score-based approach learns to predict the score function:
∇_x log p_t(x) ≈ s_θ(x, σ)
Where s_θ is the neural network predicting the score.
Flow matching learns a velocity field:
dx/dt = v_θ(x, t)
Where v_θ is the neural network predicting the velocity.
The key insight is that both parameterize the same noise:
Score: x_t = x_0 + σ·ε → ε = (x_t - x_0)/σ
Flow: x_t = (1-t)·x_0 + t·ε → v = ε - x_0
This allows us to convert score predictions to velocity predictions analytically.
- Same Information: Both models learn the same underlying data distribution
- Mathematical Equivalence: The conversion is exact under certain conditions
- Architecture Preservation: Same neural network weights work for both
- Integration Efficiency: ODE solvers are more efficient than SDE solvers
- Fine-tuning: Optional 20-50 epoch fine-tuning for perfect quality
- Advanced ODE Solvers: Dormand-Prince, adaptive step sizes
- Steering Integration: Physical guidance for flow matching
- Multi-scale: Different step counts for different protein sizes
This implementation provides a foundation for flow matching in protein structure prediction. Contributions welcome for
- Advanced ODE solvers
- Quality improvements
- Additional conversion methods
The key insight is that pretrained score models can be analytically converted to flow matching without retraining.