Skip to content

Commit 46fe3f0

Browse files
Augustin-Zidekcopybara-github
authored andcommitted
Add documentation for pre-computing and reusing MSA and templates
PiperOrigin-RevId: 825092403 Change-Id: Idfeb86387ad3e83f5742cfa19e18f6faf3530315
1 parent 9c183c8 commit 46fe3f0

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

docs/performance.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,42 @@ inference. This stage can be quite costly in terms of runtime, CPU, and RAM use.
2424
The output will be JSON files augmented with MSAs and templates that can then be
2525
directly used as input for running inference.
2626

27+
### Pre-computing and reusing MSA and templates
28+
29+
When folding multiple candidate chains with a set of fixed chains (i.e. chains
30+
that are the same for all the runs), you can optimize the process by computing
31+
the MSA and templates for the fixed chains only once. The computations for the
32+
changing candidate chains will still be performed for each run:
33+
34+
1. Run the AlphaFold 3 data pipeline for the fixed chains using the
35+
`--run_inference=false` flag. This step generates a JSON file containing the
36+
MSA and template data for these chains.
37+
2. When constructing your multimer input JSONs, populate the entries for the
38+
fixed chains using the data generated in the previous step.
39+
* For the fixed chains: Specifically, copy the `unpairedMsa`, `pairedMsa`,
40+
and `templates` fields from the pre-computed JSON into the multimer
41+
input JSON. This prevents these fields from being recomputed.
42+
* For the candidate chains: Leave these fields unset (or `null`) in the
43+
multimer input JSON. This will signal the pipeline to compute them
44+
dynamically for each run.
45+
46+
This technique can also be extended to efficiently process all combinations of
47+
*n* first chains and *m* second chains. Instead of performing *n* × *m* full
48+
computations, you can reduce this to *n* + *m* data pipeline runs.
49+
50+
In this scenario:
51+
52+
1. Run the data pipeline (step 1 above, with `--run_inference=false`) for all
53+
*n* individual first chains and all *m* individual second chains.
54+
2. Assemble the dimer input JSONs for each desired pair by combining their
55+
respective pre-computed monomer JSONs.
56+
3. Run only the inference step on these assembled JSONs using the
57+
`--run_data_pipeline=false` flag.
58+
59+
This approach has been discussed in multiple GitHub issues, such as:
60+
https://github.com/google-deepmind/alphafold3/issues/171 (which links to other
61+
similar issues).
62+
2763
### Featurisation and Model Inference Only
2864

2965
Launch `run_alphafold.py` with `--norun_data_pipeline` to skip the data pipeline

0 commit comments

Comments
 (0)