magres_old parsing improvements #189

jkshenton · 2025-02-05T17:41:20Z

I've added in something to parse extra information in magres_old blocks. A lot of it will be redundant information (i.e. also in the regular magres block), but the hyperfine tensors, for example, aren't written out in the magres blocks, only magres_old.

Done:

output from magres_old blocks now mirrors standard magres block
magres_old can now parse
- ms
- efg
- isc_*
- hf (hyperfine tensors)
updated test magres file to include new quantities in magres_old
changed the structure of the tensors in magres block to ThreeByThreeMatrix to remove any ambiguity in ordering (row vs col)
Updated test json and yaml files to the new structure

Optional things left to do:

~~[ ] Add in summaries of tensors~~
Edit: Ignoring this for now/downstream packages

For now I removed the parsing of the eigenvectors/values and the isotropy etc. for each tensor - these could be added back in, but then maybe a structure like:

- magres_old
    - magnetic_shielding | electric_field_gradient | hyperfine | j_coupling
        - atomindex | atomindex1,atomindex2
            - tensor
            - eigenvalues
            - eigenvectors
            - isotropy
            - anisotropy
            - asymmetry

Combine magres and magres_old data to remove redundancy

oerc0122

Thanks for this, starting to look a lot better! Just some style questions and hopefully simplified RE structure.

castep_outputs/parsers/magres_file_parser.py

castep_outputs/utilities/castep_res.py

jkshenton · 2025-02-06T17:20:30Z

Thanks for the comments and suggestions @oerc0122 , I think I've addressed all of them now.

Do you have thoughts on

merging redundant data (much of the magres_old block has the same info as the magres block, so it could be used to sanity-check, for example. I mainly to parse the magres_old block for the hyperfine tensors (which only appear in the magres_old, not in the magres one - for now...). One option could be to add in the missing data into a single magres output (rather than having magres and magres_old), though that might violate the design of this library (?)
We could add in the eigenvectors/values and isotropy etc. - these are redundant in that you can compute these quantities based on the tensors that are parse, but they might help people debug conventions etc...?

oerc0122 · 2025-02-06T17:42:21Z

Thanks for the comments and suggestions @oerc0122 , I think I've addressed all of them now.

Do you have thoughts on

merging redundant data (much of the magres_old block has the same info as the magres block, so it could be used to sanity-check, for example. I mainly to parse the magres_old block for the hyperfine tensors (which only appear in the magres_old, not in the magres one - for now...). One option could be to add in the missing data into a single magres output (rather than having magres and magres_old), though that might violate the design of this library (?)

We could add in the eigenvectors/values and isotropy etc. - these are redundant in that you can compute these quantities based on the tensors that are parse, but they might help people debug conventions etc...?

I think that duplicating data is probably unnecessary.

Ultimately, I'm not sure we want the magres, magres_old separation. It was there because that was the file structure, but I think we just want the magres dictionary as a whole as that's all the useful information in the file, to that end, I think adding the eigenvectors/value, etc. would be a good thing. If people don't want them they can not use them quite easily, but if information's not there, they can't.

oerc0122

Depending on your thoughts and what you're using, we can leave the merging of magres and magres_old until the refactor to an iterative form. It will, however, be an API breaking change.

oerc0122 · 2025-05-02T09:31:45Z

@jkshenton
I was wondering whether you had any more thoughts on the PR (since you have some unresolved ones), or whether you're happy as is.

We can get it merged now and sort out some of the details later or you can work on it, or I can take over and do some of the cleanup you talked about.

I think this would be a good thing to get in.

jkshenton · 2025-05-08T09:30:29Z

@oerc0122 apologies for this dropping off the radar a bit!

If you'd be happy to take on the merging of the magres_old and magres data, I would be very grateful! If not, I can have a go, but probably only in 2 weeks' time.

Happy to leave the parsing/computing of eigenvectors/values and other derived quantities to other programs/a future PR.

oerc0122 · 2025-05-09T08:17:29Z

@jkshenton Do you mind checking if this meets your needs or if I've inadvertently removed useful information? I've never done much with NMR so just want to make sure!

jkshenton · 2025-05-19T12:11:18Z

@oerc0122 I've just got round to it and made some additional changes:

I think the | merging missed the nested dictionary items, so I've added a nested merge - is that correct or is there a better way?

I also added stuff to the test.magres so it covers more cases.

The json, ruamel, and pyyaml files I generated have a bit different format in places to the previous versions - not sure why that is

Edit: Ah, just noticed that there were changes that I hadn't yet pulled in before pushing - let me try to sort that out

oerc0122

Looks pretty sensible to me. Of course since I've technically contributed might want @ajjackson 's review before merging.

oerc0122 · 2025-05-21T11:44:12Z

Turned out to just be a merge error, thankfully. Regenerating the file removed the duplication.

jkshenton · 2025-08-01T15:31:36Z

@ajjackson or @oerc0122 any final thoughts on this one? It would be good to get this finalised. Was there anything left to do on my side?

ajjackson · 2025-08-04T10:07:07Z

I think the next step is to do a rebase / force-push. There are merge conflicts so cannot do it from github.

- output from magres_old blocks now mirrors standard magres block - magres_old can now parse - ms - efg - isc_* - hf (hyperfine tensors) - updated test magres file to include new quantities in magres_old - changed the structure of the tensors in magres block to ThreeByThreeMatrix to remove any ambiguity in ordering (row vs col) - Updated test json and yaml files to the new structure

- Implemented a recursive dictionary merge function `deep_merge_dict` to facilitate merging nested dictionaries. - Enhanced `test.magres` files with additional data entries. - ions:lattice now returns a 3x3 instead of a 1x9 - Removed aliasing - keeping ms, efg, isc and hf as users will expect. I'm not sure about this, but I think this would be the least surprising option for users - Removed efg_local and efg_nonlocal following discussions with domain experts - Added hf to main magres block (should be added in future versions of CASTEP magres output)

… the test magres.pyyaml

jkshenton · 2025-08-04T16:41:17Z

Thanks for doing the rebase, @oerc0122. I've regenerated the magres.pyyaml that was causing the tests to fail. I also changed the deep_dict_merge function to use deepcopy to be safer.

oerc0122

I'm happy with this if @ajjackson is.

ajjackson · 2025-08-05T08:27:59Z

castep_outputs/parsers/magres_file_parser.py

    Parameters
    ----------
-    magres_file
+    magres_file : TextIO


It looks like the codebase is a bit inconsistent at the moment re: these redundant type annotations in the docstring.

Not a problem for this PR but maybe worth cleaning up separately. Personally I'd rather they were all gone but I think there was some reason we have to use them in the Returns section?

castep_outputs/parsers/magres_file_parser.py

ajjackson · 2025-08-05T08:45:52Z

castep_outputs/parsers/magres_file_parser.py

+MAGRES_ALIASES = {
+    "ms": "magnetic_shielding",
+    "efg": "electric_field_gradient",
+    "isc": "indirect_spin_spin_coupling",
+    "isc_fc": "indirect_spin_spin_coupling_fc",
+    "isc_orbital_p": "indirect_spin_spin_coupling_orbital_p",
+    "isc_orbital_d": "indirect_spin_spin_coupling_orbital_d",
+    "isc_spin": "indirect_spin_spin_coupling_spin",
+    "hf": "hyperfine",
+}


It looks like all references to this constant are commented out. Is that correct?

Nice catch! I didn't see that the add_aliases calls were commented out now.

How is the rest of the code done for renaming/aliasing things, @oerc0122 ? I can see arguments for a) renaming everything (easy to see what the quantities are)*, b) sticking with the original (what existing users/downstream codes might expect, and clear one-to-one mapping with the file) and even c) keeping both the original and alias version (though it's my least favourite... ).

An alternative (d) is to dump the alias dict itself directly in the output, or something similar with instead a short description of the fields?

* this is what we get if we uncomment the two add_aliases calls (and import the function).

In MD, I provide both the abbreviated (T) and the full named (temperature) since the tag T is what people will be used to from the files. However, if people are more used to the full names, add_aliases has an optional arg to replace rather than add if you prefer. I would say if the arrays are relatively small the cost of the duplicated is minimal (~0 in scripts and minimal in dumped files, smarter YAML even uses references).

castep_outputs/parsers/magres_file_parser.py

ajjackson · 2025-08-05T10:36:33Z

castep_outputs/parsers/magres_file_parser.py

+            accum[key][spec, int(ind)] = _list_to_threebythree(val)
        elif words[0].startswith("isc"):  # ISC props explicitly have spaces!
            key, speca, inda, specb, indb, *val = words
+            accum[key][(speca, int(inda)), (specb, int(indb))] = _list_to_threebythree(val)


This doesn't match the return type dict[str, str | ThreeByThreeMatrix]. Actually, neither does accum[key][spec, int(ind)] = _list_to_threebythree(val).

The data format makes sense, so I think the function annotation (and redundant docstring) need to be corrected.

ajjackson · 2025-08-05T10:38:11Z

castep_outputs/parsers/magres_file_parser.py

-    block: Block,
-    accum: dict,
-) -> tuple[ThreeByThreeMatrix, dict[AtomIndex, AtomsInfo]]:
+def _process_magres_old_block(block: Block) -> dict[str, str | ThreeByThreeMatrix]:


Again the type annotation seems out of sync with all the nested dicts it can get back

ajjackson · 2025-08-05T10:48:23Z

castep_outputs/parsers/magres_file_parser.py

+    # The magres_old blocks have data in these atom blocks
+    for match in REs.MAGRES_OLD_RE["atom"].finditer(str(block)):
+        index = atreg_to_index(match)
+        sub_blk = match.groups()[3]


Might be worth adding a capture label to this regex. It's hard to quickly figure out which part [3] will capture because the indexing is thrown off by the injected ATOM_RE

# Atom lines 571 "atom": re.compile( 572 r"=+\s+" 573 rf"( Perturbing Atom|Atom): {ATOM_RE}[\r\n]+" 574 r"=+[\r\n]+" 575 r"([^=]+)\s+", 576 re.MULTILINE | re.DOTALL, 577 ),

ajjackson · 2025-08-05T11:00:53Z

castep_outputs/parsers/magres_file_parser.py

+
+    # For any isc tensor tags, add the units if not present
+    for tag in ("isc", "isc_fc", "isc_spin", "isc_orbital_p", "isc_orbital_d"):
+        if tag in data["magres"] and tag not in data["magres"]["units"]:


Do we actually expect these to be present sometimes and not others?

Does anything bad happen if we remove the and branch and just have if tag in data["magres"] here?

jkshenton requested a review from oerc0122 February 5, 2025 17:41

oerc0122 requested changes Feb 6, 2025

View reviewed changes

oerc0122 previously approved these changes Feb 8, 2025

View reviewed changes

oerc0122 dismissed their stale review via aa020de February 12, 2025 09:36

oerc0122 force-pushed the magres_old_block_parsing branch 3 times, most recently from 178a402 to 92dd918 Compare February 12, 2025 10:00

oerc0122 force-pushed the magres_old_block_parsing branch 4 times, most recently from 94f9c91 to 2a368d6 Compare May 8, 2025 13:02

oerc0122 mentioned this pull request May 16, 2025

Enable extended ruff checks #223

Merged

oerc0122 previously approved these changes May 19, 2025

View reviewed changes

oerc0122 dismissed their stale review via 9cba5c5 May 21, 2025 11:42

oerc0122 force-pushed the magres_old_block_parsing branch from 9555831 to 9cba5c5 Compare May 21, 2025 11:42

oerc0122 requested a review from ajjackson May 21, 2025 11:44

jkshenton added 5 commits August 4, 2025 13:22

Linting fixes

709d569

Simplify magres old RegEx

156d054

magres old parser style fixes

8306f0c

ruff linting fixes for magres parser

306cc60

oerc0122 and others added 2 commits August 4, 2025 13:22

Merge magres_old with magres

6a5bd4a

oerc0122 force-pushed the magres_old_block_parsing branch from 9cba5c5 to a161a7d Compare August 4, 2025 12:37

Make the deep_merge_dict use deepcopy for the values. Also regenerate…

b606b54

… the test magres.pyyaml

oerc0122 approved these changes Aug 5, 2025

View reviewed changes

ajjackson reviewed Aug 5, 2025

View reviewed changes

magres_old parsing improvements #189

Are you sure you want to change the base?

magres_old parsing improvements #189

Uh oh!

Conversation

jkshenton commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oerc0122 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkshenton commented Feb 6, 2025

Uh oh!

oerc0122 commented Feb 6, 2025

Uh oh!

oerc0122 left a comment

Choose a reason for hiding this comment

Uh oh!

oerc0122 commented May 2, 2025

Uh oh!

jkshenton commented May 8, 2025

Uh oh!

oerc0122 commented May 9, 2025

Uh oh!

jkshenton commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oerc0122 left a comment

Choose a reason for hiding this comment

Uh oh!

oerc0122 commented May 21, 2025

Uh oh!

jkshenton commented Aug 1, 2025

Uh oh!

ajjackson commented Aug 4, 2025

Uh oh!

jkshenton commented Aug 4, 2025

Uh oh!

oerc0122 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jkshenton commented Feb 5, 2025 •

edited

Loading

jkshenton commented May 19, 2025 •

edited

Loading

oerc0122 left a comment •

edited

Loading