Description
I found a problem where the PROP
(proline N-terminal) patch gets marked as applicable to some residues other than proline, which causes problems when they also allow the NTER
(generic N-terminal) patch. OpenMM doesn't know what patch to apply to, e.g., arginine, when it appears at the N-terminal end of a protein, and stops with an error when parameterizing such a system.
This is correctable with a simple modification to the YAML files defining the conversion to remove the extraneous <AllowPatch>
tags, and I can submit a fix. I am opening this issue to make note of and document this problem more generally.
CHARMM intends patches to be manually applied by users building their systems, meaning that arbitrary patches can be included in the topology files without causing issues. OpenMM, however, intends patches to be applied automatically to residues that it can't parameterize using unpatched templates. The CHARMM force field conversion uses ParmEd to attempt to determine which patches can be applied to which residues automatically, and marks all patches that appear applicable to a given residue as such. There is some screening to prevent inclusion of distinct residues and patches that appear identical from the perspective of OpenMM. However, I found several cases while working on the conversion, and this case has come up now, in which some residue-patch combinations appear identical.
To make matters worse, ParmEd's handling of CHARMM patches was buggy and required several workarounds.
- ParmEd attempts to screen out "bad" combinations of residues and patches by requiring patched residues to have integral net charges. This criterion turns out to be incorrect, as some parts of the force field (nucleic acids) rely on individual residues having non-integral charges, with the appropriate terminal patches on both ends leading to an integrally-charged polymer. So this check has to be skipped for the conversion, which probably leads to some bad residue-patch combinations being allowed in.
- OpenMM's patch specification is more stringent than CHARMM's, and what might be one patch to CHARMM has to be several similar patches in an OpenMM FFXML. ParmEd handled this incorrectly by trying to find a "best" patch definition to write: there were too many cases where this was wrong and missed valid residue-patch combinations, so I had to modify it to write multiple patch definitions to the FFXML. The result is patches with names like
PROP_0
,PROP_1
, etc. This makes the post-generation fixes of the FFXML files that are based on residue and patch names very brittle, since if CHARMM adds new residues or patches, this numbering could change and break the specifications of<AllowPatch>
tags to remove. - ParmEd doesn't handle multi-residue patches at all, and so things like disulfide bond patches have to be manually pulled out and patched into the output. This is now semi-automated (we don't have to go into a text editor and change the FFXMLs) but is still an annoyance. I think this is still broken for the Drude model.
- ParmEd's handling of impropers spanning residue and patch atoms was completely broken; this has been entirely circumvented in the current conversion and is not central to this issue.
All of this means that each issue like this that comes up is a major headache to fix. For example, in the issue above, the PROP
patch was incorrectly getting applied to arginine, isoleucine, and lysine, but also all kinds of miscellaneous things in the CHARMM force field files like hypusine, (4R)-4-methyl-L-proline, and (2S)-2-amino-5-[(N-methylcarbamimidoyl)amino]pentanoic acid (what?). In this case, one has to track down all of the residues where this patch gets applied, figure out which ones are some kind of modified proline to which it should still apply, and remove it from the ones that shouldn't have it.
I see a few ways to deal with this more generally, each of which has issues:
- Fix each problem when someone encounters it in a system that uses some novel combination of patches and reports a bug to us. This requires a lot of work for each problem found, and future CHARMM releases could break the set of fixes that has accumulated. (This is effectively what we are doing as is, and as we want to keep supporting CHARMM into the future, the technical debt will keep building.)
- Go through every residue and patch and manually define the appropriate sets of patches that apply to each residue based on comments in the CHARMM topology files. This is tedious and error-prone, since oftentimes residues and patches might not be well documented. CHARMM includes sets of hundreds of obscure molecules mixed in with the relatively few residues and patches that the vast majority of users actually need, and updates could add more residues and patches that have to be cross-checked against all of the existing ones. CHARMM also has the habit of adding standalone residues for things like certain dipeptides that could be built from a series of other residues and patches, which makes everything more annoying.
- Reinstate the old ParmEd behavior with checks for integer total charge and single versions of each patch (this may cut down on the number of bad residue-patch combinations somewhat), then apply manual fixes to add patches back as needed. Compared to item 1, this is similar, but just replaces one set of accumulated hacks with another. There's no guarantee that the old ParmEd behavior will generate less bad residue-patch combinations either.
- Try to improve the automated conversion to prevent patches from being applied to residues when the result could be ambiguous residue-patch combinations. It's not obvious how to do this; one would have to generate all residue-patch combinations and do some kind of graph comparison to find ambiguous ones. This could also be expensive due to the combinatorial explosion of patches in some cases. Then, if you find them, it's not clear what patches to disallow.
I don't know what the best approach is going forward, and additional suggestions are welcome. I wanted to make an issue mainly to document the reasons that the CHARMM update has been painful, illustrate that these problems may keep coming up, and see if we can brainstorm any ideas for making things work better down the road.