This repository contains source codes and data for the preprint Physics-driven structural docking and protein language models accelerate antibody screening and design for broad-spectrum antiviral therapy
Step 1-1: Predict the structure of antibodies from the sequences. Then predict the docking complex using HADDOCK and our own Bayesian Active Learning (BAL)
- We calculated the ACE2 binding residues (to be protected by the antibodies), as 'ace2_binding_residue'
- We used AbodyBuilder to predict the 3D structures of the antibodies from the sequence.
- We used ProABC2 to predict the binding residues on the antibodies side.
- We used HADDOCK to predict the antibody-SARS-Cov-2 binding complex.
- We used Bayesian Activate Learning to refine the docking complexes and get a 'confidence score' for each conformation of the same antibody.
Step 1-2: Embed antibody sequences using our novel antibody language model, AbLM, that is pretrained with protein domain sequences, fine-tuned with paired VH-VL sequences, using antibody-specific masking during training.
Step 2: Predict the neutralization scores and the robustness to variants for all antibodies. All figures used in manuscripts are also generated using this jupyter notebook.
- We calculated the weighted average of the coverage rate of the ACE2 binding residues for each antibody, as the predicted neutralization score 'wt_neutralization_score'.
- We used the Kriging and the experimental variant EC50 fold changes to predict the antibody robustness, as 'kriging_prediction_results_delta', 'kriging_prediction_results_omicron_ba1' and 'kriging_prediction_results_omicron_ba5'.