Skip to content

Conversation

@jkshenton
Copy link
Collaborator

@jkshenton jkshenton commented Feb 5, 2025

I've added in something to parse extra information in magres_old blocks. A lot of it will be redundant information (i.e. also in the regular magres block), but the hyperfine tensors, for example, aren't written out in the magres blocks, only magres_old.

Done:

  • output from magres_old blocks now mirrors standard magres block
  • magres_old can now parse
    • ms
    • efg
    • isc_*
    • hf (hyperfine tensors)
  • updated test magres file to include new quantities in magres_old
  • changed the structure of the tensors in magres block to ThreeByThreeMatrix to remove any ambiguity in ordering (row vs col)
  • Updated test json and yaml files to the new structure

Optional things left to do:

  • [ ] Add in summaries of tensors
    Edit: Ignoring this for now/downstream packages

For now I removed the parsing of the eigenvectors/values and the isotropy etc. for each tensor - these could be added back in, but then maybe a structure like:

- magres_old
    - magnetic_shielding | electric_field_gradient | hyperfine | j_coupling
        - atomindex | atomindex1,atomindex2
            - tensor
            - eigenvalues
            - eigenvectors
            - isotropy
            - anisotropy
            - asymmetry
  • Combine magres and magres_old data to remove redundancy

@jkshenton jkshenton requested a review from oerc0122 February 5, 2025 17:41
Copy link
Owner

@oerc0122 oerc0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, starting to look a lot better! Just some style questions and hopefully simplified RE structure.

@jkshenton
Copy link
Collaborator Author

Thanks for the comments and suggestions @oerc0122 , I think I've addressed all of them now.

Do you have thoughts on

  1. merging redundant data (much of the magres_old block has the same info as the magres block, so it could be used to sanity-check, for example. I mainly to parse the magres_old block for the hyperfine tensors (which only appear in the magres_old, not in the magres one - for now...). One option could be to add in the missing data into a single magres output (rather than having magres and magres_old), though that might violate the design of this library (?)
  2. We could add in the eigenvectors/values and isotropy etc. - these are redundant in that you can compute these quantities based on the tensors that are parse, but they might help people debug conventions etc...?

@oerc0122
Copy link
Owner

oerc0122 commented Feb 6, 2025

Thanks for the comments and suggestions @oerc0122 , I think I've addressed all of them now.

Do you have thoughts on

  1. merging redundant data (much of the magres_old block has the same info as the magres block, so it could be used to sanity-check, for example. I mainly to parse the magres_old block for the hyperfine tensors (which only appear in the magres_old, not in the magres one - for now...). One option could be to add in the missing data into a single magres output (rather than having magres and magres_old), though that might violate the design of this library (?)
  2. We could add in the eigenvectors/values and isotropy etc. - these are redundant in that you can compute these quantities based on the tensors that are parse, but they might help people debug conventions etc...?

I think that duplicating data is probably unnecessary.

Ultimately, I'm not sure we want the magres, magres_old separation. It was there because that was the file structure, but I think we just want the magres dictionary as a whole as that's all the useful information in the file, to that end, I think adding the eigenvectors/value, etc. would be a good thing. If people don't want them they can not use them quite easily, but if information's not there, they can't.

oerc0122
oerc0122 previously approved these changes Feb 8, 2025
Copy link
Owner

@oerc0122 oerc0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on your thoughts and what you're using, we can leave the merging of magres and magres_old until the refactor to an iterative form. It will, however, be an API breaking change.

@oerc0122 oerc0122 force-pushed the magres_old_block_parsing branch 3 times, most recently from 178a402 to 92dd918 Compare February 12, 2025 10:00
@oerc0122
Copy link
Owner

oerc0122 commented May 2, 2025

@jkshenton
I was wondering whether you had any more thoughts on the PR (since you have some unresolved ones), or whether you're happy as is.

We can get it merged now and sort out some of the details later or you can work on it, or I can take over and do some of the cleanup you talked about.

I think this would be a good thing to get in.

@jkshenton
Copy link
Collaborator Author

@oerc0122 apologies for this dropping off the radar a bit!

If you'd be happy to take on the merging of the magres_old and magres data, I would be very grateful! If not, I can have a go, but probably only in 2 weeks' time.

Happy to leave the parsing/computing of eigenvectors/values and other derived quantities to other programs/a future PR.

@oerc0122 oerc0122 force-pushed the magres_old_block_parsing branch 4 times, most recently from 94f9c91 to 2a368d6 Compare May 8, 2025 13:02
@oerc0122
Copy link
Owner

oerc0122 commented May 9, 2025

@jkshenton Do you mind checking if this meets your needs or if I've inadvertently removed useful information? I've never done much with NMR so just want to make sure!

@jkshenton
Copy link
Collaborator Author

jkshenton commented May 19, 2025

@oerc0122 I've just got round to it and made some additional changes:

I think the | merging missed the nested dictionary items, so I've added a nested merge - is that correct or is there a better way?

I also added stuff to the test.magres so it covers more cases.

The json, ruamel, and pyyaml files I generated have a bit different format in places to the previous versions - not sure why that is

Edit: Ah, just noticed that there were changes that I hadn't yet pulled in before pushing - let me try to sort that out

oerc0122
oerc0122 previously approved these changes May 19, 2025
Copy link
Owner

@oerc0122 oerc0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty sensible to me. Of course since I've technically contributed might want @ajjackson 's review before merging.

@oerc0122 oerc0122 force-pushed the magres_old_block_parsing branch from 9555831 to 9cba5c5 Compare May 21, 2025 11:42
@oerc0122
Copy link
Owner

Turned out to just be a merge error, thankfully. Regenerating the file removed the duplication.

@oerc0122 oerc0122 requested a review from ajjackson May 21, 2025 11:44
@jkshenton
Copy link
Collaborator Author

@ajjackson or @oerc0122 any final thoughts on this one? It would be good to get this finalised. Was there anything left to do on my side?

@ajjackson
Copy link
Collaborator

I think the next step is to do a rebase / force-push. There are merge conflicts so cannot do it from github.

- output from magres_old blocks now mirrors standard magres block
-  magres_old can now parse
   - ms
   - efg
   - isc_*
   - hf (hyperfine tensors)
- updated test magres file to include new quantities in magres_old
- changed the structure of the tensors in magres block to ThreeByThreeMatrix to remove any ambiguity in ordering (row vs col)
- Updated test json and yaml files to the new structure
oerc0122 and others added 2 commits August 4, 2025 13:22
- Implemented a recursive dictionary merge function `deep_merge_dict` to facilitate merging nested dictionaries.
- Enhanced `test.magres` files with additional data entries.
- ions:lattice now returns a 3x3 instead of a 1x9
- Removed aliasing - keeping ms, efg, isc and hf as users will expect. I'm not sure about this, but I think this would be the least surprising option for users
- Removed efg_local and efg_nonlocal following discussions with domain experts
- Added hf to main magres block (should be added in future versions of CASTEP magres output)
@oerc0122 oerc0122 force-pushed the magres_old_block_parsing branch from 9cba5c5 to a161a7d Compare August 4, 2025 12:37
@jkshenton
Copy link
Collaborator Author

Thanks for doing the rebase, @oerc0122. I've regenerated the magres.pyyaml that was causing the tests to fail. I also changed the deep_dict_merge function to use deepcopy to be safer.

Copy link
Owner

@oerc0122 oerc0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this if @ajjackson is.

Parameters
----------
magres_file
magres_file : TextIO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the codebase is a bit inconsistent at the moment re: these redundant type annotations in the docstring.

Not a problem for this PR but maybe worth cleaning up separately. Personally I'd rather they were all gone but I think there was some reason we have to use them in the Returns section?

Comment on lines +21 to +30
MAGRES_ALIASES = {
"ms": "magnetic_shielding",
"efg": "electric_field_gradient",
"isc": "indirect_spin_spin_coupling",
"isc_fc": "indirect_spin_spin_coupling_fc",
"isc_orbital_p": "indirect_spin_spin_coupling_orbital_p",
"isc_orbital_d": "indirect_spin_spin_coupling_orbital_d",
"isc_spin": "indirect_spin_spin_coupling_spin",
"hf": "hyperfine",
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all references to this constant are commented out. Is that correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I didn't see that the add_aliases calls were commented out now.

How is the rest of the code done for renaming/aliasing things, @oerc0122 ? I can see arguments for a) renaming everything (easy to see what the quantities are)*, b) sticking with the original (what existing users/downstream codes might expect, and clear one-to-one mapping with the file) and even c) keeping both the original and alias version (though it's my least favourite... ).

An alternative (d) is to dump the alias dict itself directly in the output, or something similar with instead a short description of the fields?

* this is what we get if we uncomment the two add_aliases calls (and import the function).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MD, I provide both the abbreviated (T) and the full named (temperature) since the tag T is what people will be used to from the files. However, if people are more used to the full names, add_aliases has an optional arg to replace rather than add if you prefer. I would say if the arrays are relatively small the cost of the duplicated is minimal (~0 in scripts and minimal in dumped files, smarter YAML even uses references).

accum[key][spec, int(ind)] = _list_to_threebythree(val)
elif words[0].startswith("isc"): # ISC props explicitly have spaces!
key, speca, inda, specb, indb, *val = words
accum[key][(speca, int(inda)), (specb, int(indb))] = _list_to_threebythree(val)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match the return type dict[str, str | ThreeByThreeMatrix]. Actually, neither does accum[key][spec, int(ind)] = _list_to_threebythree(val).

The data format makes sense, so I think the function annotation (and redundant docstring) need to be corrected.

block: Block,
accum: dict,
) -> tuple[ThreeByThreeMatrix, dict[AtomIndex, AtomsInfo]]:
def _process_magres_old_block(block: Block) -> dict[str, str | ThreeByThreeMatrix]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the type annotation seems out of sync with all the nested dicts it can get back

# The magres_old blocks have data in these atom blocks
for match in REs.MAGRES_OLD_RE["atom"].finditer(str(block)):
index = atreg_to_index(match)
sub_blk = match.groups()[3]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a capture label to this regex. It's hard to quickly figure out which part [3] will capture because the indexing is thrown off by the injected ATOM_RE

    # Atom lines
 571     "atom": re.compile(
 572         r"=+\s+"
 573         rf"( Perturbing Atom|Atom): {ATOM_RE}[\r\n]+"
 574         r"=+[\r\n]+"                          
 575         r"([^=]+)\s+",                             
 576         re.MULTILINE | re.DOTALL,              
 577     ),                                 


# For any isc tensor tags, add the units if not present
for tag in ("isc", "isc_fc", "isc_spin", "isc_orbital_p", "isc_orbital_d"):
if tag in data["magres"] and tag not in data["magres"]["units"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually expect these to be present sometimes and not others?

Does anything bad happen if we remove the and branch and just have if tag in data["magres"] here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants