Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilizing Precomputed MSA with Constraints for Complex Structure Prediction #62

Closed
kehan777 opened this issue Feb 4, 2025 · 6 comments

Comments

@kehan777
Copy link

kehan777 commented Feb 4, 2025

Hello,
I am working on complex structure prediction using Protenix and have precomputed MSA results.
I would like to incorporate additional constraints (e.g., distance restraints) into the prediction pipeline.
However, I am unsure how to properly integrate these constraints with the existing MSA data.
Could you provide more details or documentation on how to:
Use precomputed MSA results and constraints json in Protenix.
Thank you for your help!

@zhangyuxuann
Copy link
Collaborator

@kehan777 you can refer to example_constraint.json and example.json with msa. Add the precomputed MSA in "proteinChain".

@kehan777
Copy link
Author

kehan777 commented Feb 6, 2025

Thank you for your answer, i use example_constraint.json like:
[
{
"sequences": [
{
"proteinChain": {
"sequence": "EVMLVESGGGLVKPGGSLKLSCAASGFTFSNYAMSWVRQTPEKRLEWVAAISGNEGTYTYYPDSVRGRFTISRDNARNNLYLQISSLRSEDTALYYCARYGLVGALDFWGQGASVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSCDKTHHHHHH",
"count": 1,
"msa":{
"search_tool": "jackhmmer",
"pairing_db": "uniprot",
"pairing_db_fpath": "./database/uniprot/uniprot.fasta",
"non_pairing_db_fpath": "./database/mgnify/mgy_clusters_2022_05.fa",
"msa_save_dir": "./searched_msa"
}
}
},
{
"proteinChain": {
"sequence": "DIQMNQSPSTLSASLGDTITITCRASQNIDVWLNWYQQKPGDIPKLLIYEASNLHTGVPSRFSGSGSGTDFTLAISSLQPEDIATYYCLQGQDYPFTFGSGTKLEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC",
"count": 1,
"msa":{
"search_tool": "jackhmmer",
"pairing_db": "uniprot",
"pairing_db_fpath": "./database/uniprot/uniprot.fasta",
"non_pairing_db_fpath": "./database/mgnify/mgy_clusters_2022_05.fa",
"msa_save_dir": "./searched_msa"
}
}
},
{
"proteinChain": {
"sequence": "ESSVSLTVPPVVKLENGSSTNVSLTLRPPLNATLVITFEITFRSKNITILELPDEVVVPPGVTNSSFQVTSQNVGQLTVYLHGNHSNQTGPRIRFLVIRSSAISIINQVIGWIYFVAWSISFYPQVIMNWRRKSVIGLSFDFVALNLTGFVAYSVFNIGLLWVPYIKEQFLLKYPNGVNPVNSNDVFFSLHAVVLTLIIIVQCCLYERGGQRVSWPAIGFLVLAWLFAFVTMIVAAVGVITWLQFLFCFSYIKLAVTLVKYFPQAYMKFYYKSTEGWSIGNVLLDFTGGSFSLLQMFLQSYNNDQWTLIFGDPTKFGLGVFSIVFDVVFFIQHFCLYRKRPGLQAARTGSGSRLRQDWAPSLQPKALPQTTSVSASSLKGDYKDDDDK",
"count": 1,
"msa":{
"search_tool": "jackhmmer",
"pairing_db": "uniprot",
"pairing_db_fpath": "./database/uniprot/uniprot.fasta",
"non_pairing_db_fpath": "./database/mgnify/mgy_clusters_2022_05.fa",
"msa_save_dir": "./searched_msa"
} }
}
],
"modelSeeds": [],
"name": "8dkw_contact",
"constraint": {
"contact": [
{
"entity1": 2,
"copy1": 1,
"position1": 96,
"entity2": 3,
"copy2": 1,
"position2": 28,
"max_distance": 15
}
]
}
}
]

secondly, i modified position2
"name": "8dkw_contact2",
"constraint": {
"contact": [
{
"entity1": 2,
"copy1": 1,
"position1": 96,
"entity2": 3,
"copy2": 1,
"position2": 270,
"max_distance": 12
}
]

However, the position change did not yield significantly different results.

8dkw_contact2_seed_101_sample_0.txt is a .cif file

8dkw_contact_seed_101_sample_0.txt
8dkw_contact_seed_101_summary_confidence_sample_0.json

8dkw_contact2_seed_101_sample_0.txt
8dkw_contact2_seed_101_summary_confidence_sample_0.json

@Anfankus
Copy link
Contributor

Anfankus commented Feb 6, 2025

Hi @kehan777 , thanks you for actively trying out our features. I tried to reproduce your results on my local environment and on Protenix's webserver, but my results show that the two contacts will get significantly different results. I checked your result and it seems that the constraints are not in effect. Maybe you can check if your branch is constraint_esm and if the contact info is printed in the running log.

Image

@kehan777
Copy link
Author

Thank you for your answer, and it work!
protenix predict --input examples/example_constraint.json --out_dir ./output_msa_esm --seeds 101 --use_msa_server --use_esm

another quenstions, can you use existing msa in constraint.json to predict? If possible, please provide examples

@Anfankus
Copy link
Contributor

Yes, Proteinx supports precomputed MSA and constraints at the same time. Just simply fill field msa and constraint in the json file as you showed above.

If it does not work, I suppose it might because you have run pip install protenix before, and this command would install branch main but constraint_esm by default on you local environment. Maybe you can clone Protenix source code and try inference_demo.sh instead of protenix predict ....

@kehan777
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants