@@ -24,6 +24,42 @@ inference. This stage can be quite costly in terms of runtime, CPU, and RAM use.
2424The output will be JSON files augmented with MSAs and templates that can then be
2525directly used as input for running inference.
2626
27+ ### Pre-computing and reusing MSA and templates
28+
29+ When folding multiple candidate chains with a set of fixed chains (i.e. chains
30+ that are the same for all the runs), you can optimize the process by computing
31+ the MSA and templates for the fixed chains only once. The computations for the
32+ changing candidate chains will still be performed for each run:
33+
34+ 1 . Run the AlphaFold 3 data pipeline for the fixed chains using the
35+ ` --run_inference=false ` flag. This step generates a JSON file containing the
36+ MSA and template data for these chains.
37+ 2 . When constructing your multimer input JSONs, populate the entries for the
38+ fixed chains using the data generated in the previous step.
39+ * For the fixed chains: Specifically, copy the ` unpairedMsa ` , ` pairedMsa ` ,
40+ and ` templates ` fields from the pre-computed JSON into the multimer
41+ input JSON. This prevents these fields from being recomputed.
42+ * For the candidate chains: Leave these fields unset (or ` null ` ) in the
43+ multimer input JSON. This will signal the pipeline to compute them
44+ dynamically for each run.
45+
46+ This technique can also be extended to efficiently process all combinations of
47+ * n* first chains and * m* second chains. Instead of performing * n* × * m* full
48+ computations, you can reduce this to * n* + * m* data pipeline runs.
49+
50+ In this scenario:
51+
52+ 1 . Run the data pipeline (step 1 above, with ` --run_inference=false ` ) for all
53+ * n* individual first chains and all * m* individual second chains.
54+ 2 . Assemble the dimer input JSONs for each desired pair by combining their
55+ respective pre-computed monomer JSONs.
56+ 3 . Run only the inference step on these assembled JSONs using the
57+ ` --run_data_pipeline=false ` flag.
58+
59+ This approach has been discussed in multiple GitHub issues, such as:
60+ https://github.com/google-deepmind/alphafold3/issues/171 (which links to other
61+ similar issues).
62+
2763### Featurisation and Model Inference Only
2864
2965Launch ` run_alphafold.py ` with ` --norun_data_pipeline ` to skip the data pipeline
0 commit comments