Boltz Flow Matching: Analytical Conversion from Score-Based Diffusion

This repository implements analytical conversion from score-based diffusion to flow matching for the Boltz-2 protein structure prediction model. This approach enables faster sampling without requiring any retraining.

Key Benefits

No retraining required, works with existing pretrained checkpoints
Same architecture, uses the existing diffusion module
Pure analytical transformation from score to velocity

📖 The Core Idea

Score-Based Diffusion vs Flow Matching

Score-Based Diffusion Original Boltz-2

Uses stochastic differential equations SDEs
Each step involves random noise injection
Slower due to stochastic nature

Flow Matching This Implementation

Uses ordinary differential equations ODEs
Deterministic integration no random noise
Faster due to deterministic nature

The Analytical Conversion

The key insight is that both approaches parameterize the same underlying noise:

Score Model: x_t = x_0 + σ·ε  →  ε = (x_t - x_0)/σ
Flow Model:  x_t = (1-t)·x_0 + t·ε  →  v = ε - x_0

Where:

x_t: Noisy coordinates at time t
x_0: Clean coordinates (ground truth)
ε: Noise vector
σ: Noise level
v: Velocity field for flow matching

Why It Is Faster

Fewer steps required for integration
Deterministic integration avoids random noise generation
Heun integration RK2 improves efficiency over simple Euler
No architectural changes reduce overhead

Technical Implementation

Architecture Compatibility

The implementation uses the exact same DiffusionModule as the original Boltz-2:

# Same architecture as diffusionv2.py
self.score_model = DiffusionModule(**score_model_args)

# Only difference: analytical conversion layer
self.converter = ScoreToVelocityConverter(
    conversion_method='noise_based'  # Most accurate method
)

Conversion Methods

Three analytical conversion methods are implemented:

noise_based (RECOMMENDED): Most accurate

epsilon = (x_t - x_0_pred) / sigma
velocity = epsilon - x_0_pred

pflow: Probability flow ODE
```
velocity = 0.5 * (x_0_pred - x_t)
```

simple: Direct geometric conversion

x_1_est = (x_t - (1-t)*x_0_pred) / t
velocity = x_1_est - x_0_pred

Integration Method

Uses Heun's method (RK2) for ODE integration:

# First velocity evaluation
v1 = velocity_network_forward(x, t_curr)

# Euler step
x_euler = x + dt * v1

# Second velocity evaluation  
v2 = velocity_network_forward(x_euler, t_next)

# Heun update (average of velocities)
x_new = x + 0.5 * dt * (v1 + v2)

Setup and Usage

Environment setup with conda

conda env create -f environment.yml
conda activate boltz

# optional editable install for src package
pip install -e .

Quick start

Quick Start

# Run flow matching predictions
python run_boltz_flow_matching.py

# This will:
# 1 Load Boltz 2 checkpoint at ~/.boltz/boltz2_conf.ckpt
# 2 Convert to flow matching format
# 3 Run predictions on hackathon data
# 4 Generate results

Custom parameters

from run_boltz_flow_matching import BoltzFlowMatchingRunner

runner = BoltzFlowMatchingRunner(
    flow_steps=20,        # ODE integration steps
    score_steps=200,       # Original SDE steps (for comparison)
    diffusion_samples=1,   # Number of samples per protein
    device='cuda'         # Device to use
)

results = runner.run_predictions(max_proteins=5)

Direct model usage

from boltz.model.models.boltz2 import Boltz2

# Load converted checkpoint
model = Boltz2.load_from_checkpoint(
    "flow_matching_boltz2.ckpt",
    map_location='cuda'
)

# The model automatically uses FlowMatchingDiffusion
# when use_flow_matching=True in hyperparameters

Examples and scripts

Main runner script run_boltz_flow_matching.py
Hackathon prediction script hackathon/predict_hackathon.py and helper API in hackathon/hackathon_api.py
Training entrypoint scripts/train/train.py with configs in scripts/train/configs
MSA generation scripts/generate_local_msa.py
Evaluation helpers under scripts/eval

Implementation Details

File structure

├── run_boltz_flow_matching.py          # Main runner script
├── src/boltz/model/modules/
│   └── diffusionv3_flow_matching.py    # Flow matching implementation
└── src/boltz/model/models/
    └── boltz2.py                       # Modified to support flow matching

Key classes

BoltzFlowMatchingRunner: Main orchestrator
FlowMatchingDiffusion: Flow matching module
ScoreToVelocityConverter: Analytical conversion
Boltz2: Modified model with flow matching support

Integration points

The flow matching is integrated into Boltz-2 through:

Conditional Import:

try:
    from boltz.model.modules.diffusionv3_flow_matching import AtomDiffusion as FlowMatchingDiffusion
except ImportError:
    FlowMatchingDiffusion = None

Hyperparameter Control:

if use_flow_matching and FLOW_MATCHING_AVAILABLE:
    self.structure_module = FlowMatchingDiffusion(...)
else:
    self.structure_module = AtomDiffusion(...)

Checkpoint Conversion:

hparams['use_flow_matching'] = True
hparams['flow_conversion_method'] = 'noise_based'

Mathematical Foundation

Score based diffusion

The score-based approach learns to predict the score function:

∇_x log p_t(x) ≈ s_θ(x, σ)

Where s_θ is the neural network predicting the score.

Flow matching

Flow matching learns a velocity field:

dx/dt = v_θ(x, t)

Where v_θ is the neural network predicting the velocity.

Analytical conversion

The key insight is that both parameterize the same noise:

Score: x_t = x_0 + σ·ε  →  ε = (x_t - x_0)/σ
Flow:  x_t = (1-t)·x_0 + t·ε  →  v = ε - x_0

This allows us to convert score predictions to velocity predictions analytically.

Why this works

Same Information: Both models learn the same underlying data distribution
Mathematical Equivalence: The conversion is exact under certain conditions
Architecture Preservation: Same neural network weights work for both
Integration Efficiency: ODE solvers are more efficient than SDE solvers

Future improvements

Fine-tuning: Optional 20-50 epoch fine-tuning for perfect quality
Advanced ODE Solvers: Dormand-Prince, adaptive step sizes
Steering Integration: Physical guidance for flow matching
Multi-scale: Different step counts for different protein sizes

References

Contributing

This implementation provides a foundation for flow matching in protein structure prediction. Contributions welcome for

Advanced ODE solvers
Quality improvements
Additional conversion methods

The key insight is that pretrained score models can be analytically converted to flow matching without retraining.

Name		Name	Last commit message	Last commit date
Latest commit History 428 Commits
.github/workflows		.github/workflows
boltz_inputs		boltz_inputs
docs		docs
examples		examples
hackathon		hackathon
scripts		scripts
src/boltz		src/boltz
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
convert_score_to_flow.py		convert_score_to_flow.py
debug_output.txt		debug_output.txt
environment.yml		environment.yml
measure_speed_comparison.py		measure_speed_comparison.py
pyproject.toml		pyproject.toml
run_boltz_flow_matching.py		run_boltz_flow_matching.py
test_flow_matching_standardized.py		test_flow_matching_standardized.py
train_flow_matching.py		train_flow_matching.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Boltz Flow Matching: Analytical Conversion from Score-Based Diffusion

Key Benefits

📖 The Core Idea

Score-Based Diffusion vs Flow Matching

The Analytical Conversion

Why It Is Faster

Technical Implementation

Architecture Compatibility

Conversion Methods

Integration Method

Setup and Usage

Environment setup with conda

Quick start

Quick Start

Custom parameters

Direct model usage

Examples and scripts

Implementation Details

File structure

Key classes

Integration points

Mathematical Foundation

Score based diffusion

Flow matching

Analytical conversion

Why this works

Future improvements

References

Contributing

About

Uh oh!

Releases

Packages

Languages

License

NiklasAbraham/boltz-hackathon-template

Folders and files

Latest commit

History

Repository files navigation

Boltz Flow Matching: Analytical Conversion from Score-Based Diffusion

Key Benefits

📖 The Core Idea

Score-Based Diffusion vs Flow Matching

The Analytical Conversion

Why It Is Faster

Technical Implementation

Architecture Compatibility

Conversion Methods

Integration Method

Setup and Usage

Environment setup with conda

Quick start

Quick Start

Custom parameters

Direct model usage

Examples and scripts

Implementation Details

File structure

Key classes

Integration points

Mathematical Foundation

Score based diffusion

Flow matching

Analytical conversion

Why this works

Future improvements

References

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages