-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDB loader throws errors - Note: fixed in release branch #3054
Comments
#2968 seems to be related |
The |
I fixed it before, just didn't get to the beta. @timsnow has confirmed. |
Actually the fix only fixes the first error Apparently, there is a lot of history of violent arguments about what the correct format should be (or what should be a valid PDB file). For our purposes, as long as we are "rolling our own" I suggest we stick to the recommended v3.3 standard. We should however do a lot better job of using the exception to not load the file and send a message to the log that the pdb does not match the v3.3 standard definition. This would remove the problem of later hanging when pressing compute while giving more useful (and less verbose) feedback to the user. |
@butlerpd Unfortunately, it's pretty complicated. I think whatever the solution, it's going to be hard to get it to work automatically, as the extent and manner of what is represented in a PDB can vary quite a lot. The most obvious thing being whether H is present or not. There are other things too though, like HOH entries, and the alpha, beta, gamma carbons etc. |
I agree. see #3055 (comment) Your previous fix @lucas-wilkins does make the reader read anything that conform properly to the current published standard. However, as mentioned in that issue also (#3055 (comment)) I think we can easily prevent the hanging when reading "malformed" files. For a later discussion would be I think: Do we just say we will only deal with files that strictly adhere to the standard and provide all the atoms of interest or do we try to use a python reader (e.g. the Bio.PDB module of the biopython.py package - BSD-3 license) to handle the reading more elegantly? Personally I think documenting ourselves as "strict" would be the way to go until someone who feels it is important to their science to be more forgiving decides to get involved and contribute? |
Describe the bug
Loading a PDB in the GSC throws a bunch of read errors into the log file. It seems like the program may be expecting to find atoms named C, N, O, which are in the last column (though not even available in all PDB files) but in some cases at least it seems to be reading the identity column which has things like CA, CB, CG, H1 etc. (alpha carbon etc), and there are many variations of these included in the PDB. But it does not seem to be consistent. All PDB files read in seem to have a slew of
Others however also have a bunch of warning lines like:
In those cases there seems to be at least one traceback error after the read (when it now auto calculates the Rg) though that may be a function of the fact that both files tested had the hydrogen atoms included and in some case two. The second error is what showed up when only one error was generated, while this is the order in the second case. Other types of files may cause different errors/permutations:
Given the many WARNINGS
about
index out of range` or setting the SLD of the various elements to zero then running anyway is rather concerning. In particular if the calculation is ignoring atoms and/or setting them their sld to zero would cause incorrect answers.To Reproduce
Steps to reproduce the behavior:
Expected behavior
The file should be read in correctly, OR throw an exception that warns the user that the PDB cannot be read (and do not load it)
SasView version (please complete the following information):
Operating system (please complete the following information):
Additional context
Identified during NIST CNR summer school and thus being labelled 6.0.0. Whether the error existed in 5.0.6 has not yet been verified. However, if it is silently producing wrong answers I would argue fixing it now, if possible, would still be appropriate
FILES to REPRODUCE
NOTE: these should be renamed as *.pdb before using. Github does not allow that extension hence the change to .txt here
The text was updated successfully, but these errors were encountered: