-
Notifications
You must be signed in to change notification settings - Fork 3
magres_old parsing improvements #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
oerc0122
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, starting to look a lot better! Just some style questions and hopefully simplified RE structure.
|
Thanks for the comments and suggestions @oerc0122 , I think I've addressed all of them now. Do you have thoughts on
|
I think that duplicating data is probably unnecessary. Ultimately, I'm not sure we want the |
oerc0122
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on your thoughts and what you're using, we can leave the merging of magres and magres_old until the refactor to an iterative form. It will, however, be an API breaking change.
178a402 to
92dd918
Compare
|
@jkshenton We can get it merged now and sort out some of the details later or you can work on it, or I can take over and do some of the cleanup you talked about. I think this would be a good thing to get in. |
|
@oerc0122 apologies for this dropping off the radar a bit! If you'd be happy to take on the merging of the magres_old and magres data, I would be very grateful! If not, I can have a go, but probably only in 2 weeks' time. Happy to leave the parsing/computing of eigenvectors/values and other derived quantities to other programs/a future PR. |
94f9c91 to
2a368d6
Compare
|
@jkshenton Do you mind checking if this meets your needs or if I've inadvertently removed useful information? I've never done much with NMR so just want to make sure! |
|
@oerc0122 I've just got round to it and made some additional changes: I think the I also added stuff to the test.magres so it covers more cases. The json, ruamel, and pyyaml files I generated have a bit different format in places to the previous versions - not sure why that is Edit: Ah, just noticed that there were changes that I hadn't yet pulled in before pushing - let me try to sort that out |
oerc0122
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty sensible to me. Of course since I've technically contributed might want @ajjackson 's review before merging.
9555831 to
9cba5c5
Compare
|
Turned out to just be a merge error, thankfully. Regenerating the file removed the duplication. |
|
@ajjackson or @oerc0122 any final thoughts on this one? It would be good to get this finalised. Was there anything left to do on my side? |
|
I think the next step is to do a rebase / force-push. There are merge conflicts so cannot do it from github. |
- output from magres_old blocks now mirrors standard magres block - magres_old can now parse - ms - efg - isc_* - hf (hyperfine tensors) - updated test magres file to include new quantities in magres_old - changed the structure of the tensors in magres block to ThreeByThreeMatrix to remove any ambiguity in ordering (row vs col) - Updated test json and yaml files to the new structure
- Implemented a recursive dictionary merge function `deep_merge_dict` to facilitate merging nested dictionaries. - Enhanced `test.magres` files with additional data entries. - ions:lattice now returns a 3x3 instead of a 1x9 - Removed aliasing - keeping ms, efg, isc and hf as users will expect. I'm not sure about this, but I think this would be the least surprising option for users - Removed efg_local and efg_nonlocal following discussions with domain experts - Added hf to main magres block (should be added in future versions of CASTEP magres output)
9cba5c5 to
a161a7d
Compare
… the test magres.pyyaml
|
Thanks for doing the rebase, @oerc0122. I've regenerated the magres.pyyaml that was causing the tests to fail. I also changed the deep_dict_merge function to use deepcopy to be safer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this if @ajjackson is.
| Parameters | ||
| ---------- | ||
| magres_file | ||
| magres_file : TextIO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the codebase is a bit inconsistent at the moment re: these redundant type annotations in the docstring.
Not a problem for this PR but maybe worth cleaning up separately. Personally I'd rather they were all gone but I think there was some reason we have to use them in the Returns section?
| MAGRES_ALIASES = { | ||
| "ms": "magnetic_shielding", | ||
| "efg": "electric_field_gradient", | ||
| "isc": "indirect_spin_spin_coupling", | ||
| "isc_fc": "indirect_spin_spin_coupling_fc", | ||
| "isc_orbital_p": "indirect_spin_spin_coupling_orbital_p", | ||
| "isc_orbital_d": "indirect_spin_spin_coupling_orbital_d", | ||
| "isc_spin": "indirect_spin_spin_coupling_spin", | ||
| "hf": "hyperfine", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like all references to this constant are commented out. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I didn't see that the add_aliases calls were commented out now.
How is the rest of the code done for renaming/aliasing things, @oerc0122 ? I can see arguments for a) renaming everything (easy to see what the quantities are)*, b) sticking with the original (what existing users/downstream codes might expect, and clear one-to-one mapping with the file) and even c) keeping both the original and alias version (though it's my least favourite... ).
An alternative (d) is to dump the alias dict itself directly in the output, or something similar with instead a short description of the fields?
* this is what we get if we uncomment the two add_aliases calls (and import the function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MD, I provide both the abbreviated (T) and the full named (temperature) since the tag T is what people will be used to from the files. However, if people are more used to the full names, add_aliases has an optional arg to replace rather than add if you prefer. I would say if the arrays are relatively small the cost of the duplicated is minimal (~0 in scripts and minimal in dumped files, smarter YAML even uses references).
| accum[key][spec, int(ind)] = _list_to_threebythree(val) | ||
| elif words[0].startswith("isc"): # ISC props explicitly have spaces! | ||
| key, speca, inda, specb, indb, *val = words | ||
| accum[key][(speca, int(inda)), (specb, int(indb))] = _list_to_threebythree(val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the return type dict[str, str | ThreeByThreeMatrix]. Actually, neither does accum[key][spec, int(ind)] = _list_to_threebythree(val).
The data format makes sense, so I think the function annotation (and redundant docstring) need to be corrected.
| block: Block, | ||
| accum: dict, | ||
| ) -> tuple[ThreeByThreeMatrix, dict[AtomIndex, AtomsInfo]]: | ||
| def _process_magres_old_block(block: Block) -> dict[str, str | ThreeByThreeMatrix]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again the type annotation seems out of sync with all the nested dicts it can get back
| # The magres_old blocks have data in these atom blocks | ||
| for match in REs.MAGRES_OLD_RE["atom"].finditer(str(block)): | ||
| index = atreg_to_index(match) | ||
| sub_blk = match.groups()[3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a capture label to this regex. It's hard to quickly figure out which part [3] will capture because the indexing is thrown off by the injected ATOM_RE
# Atom lines
571 "atom": re.compile(
572 r"=+\s+"
573 rf"( Perturbing Atom|Atom): {ATOM_RE}[\r\n]+"
574 r"=+[\r\n]+"
575 r"([^=]+)\s+",
576 re.MULTILINE | re.DOTALL,
577 ),
|
|
||
| # For any isc tensor tags, add the units if not present | ||
| for tag in ("isc", "isc_fc", "isc_spin", "isc_orbital_p", "isc_orbital_d"): | ||
| if tag in data["magres"] and tag not in data["magres"]["units"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually expect these to be present sometimes and not others?
Does anything bad happen if we remove the and branch and just have if tag in data["magres"] here?
I've added in something to parse extra information in magres_old blocks. A lot of it will be redundant information (i.e. also in the regular magres block), but the hyperfine tensors, for example, aren't written out in the magres blocks, only magres_old.
Done:
Optional things left to do:
[ ] Add in summaries of tensorsEdit: Ignoring this for now/downstream packages
For now I removed the parsing of the eigenvectors/values and the isotropy etc. for each tensor - these could be added back in, but then maybe a structure like: