Skip to content

Conversation

@ndaelman-hu
Copy link
Collaborator

See issue #242

- TODO: connect to other functionalities
-- TODO: resolve missing population of `site_symmetry`
- Add standard naming to Wyckoff positions
- Remove site_symmetry quantity from WyckoffPosition class
- Remove unpopulated degrees_of_freedom, equivalent_coordinates, and coordinate_expressions fields
- Fix coordinate handling to only set fractional coordinates for crystallographic systems
- Clean up _populate_wyckoff_positions method
@ndaelman-hu ndaelman-hu self-assigned this Aug 20, 2025
@coveralls
Copy link

coveralls commented Aug 21, 2025

Pull Request Test Coverage Report for Build 17124103302

Details

  • 21 of 29 (72.41%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.2%) to 65.857%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/nomad_simulations/schema_packages/model_system.py 21 29 72.41%
Files with Coverage Reduction New Missed Lines %
src/nomad_simulations/schema_packages/model_system.py 1 79.96%
Totals Coverage Status
Change from base Build 17093658475: -0.2%
Covered Lines: 2440
Relevant Lines: 3705

💛 - Coveralls

@ndaelman-hu
Copy link
Collaborator Author

Main modifications:

  • elaborate on Wyckoff description, explain what it represents (defined wrt conventional cell) and its limits (geometry specific).
  • extend Wyckoff to include site multiplicity.
  • relax unique site definition to include computational settings. Remove the matching MatID descriptor.
  • move Wyckoff to Modelsystem to mimic positions (which annotates). Alternative location would be Symmetry, which makes semantically more sense. The issue is how to distinguish between multiple Symmetry subsections.
  • start up ModelSystem docs to keep track of design choices.

Open questions:

  • should we also provide the site symmetry operators with Wyckoff. These symmetry operators then need a definition.
  • ratify cell and symmetry. Symmetry seems set up to further define Cell, but the connection is indirect (reference at Symmetry.cell_ref). It also isn't clear which ModelSystem.cell defines the current positions, or how to rapidly retrieve the original cell (atm requires scanning).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses issue #242 by replacing the AtomicCell class with a more generic Cell class and moving Wyckoff position information from cell-level to system-level storage in ModelSystem. The change simplifies the model system representation while consolidating crystallographic data at the appropriate hierarchical level.

Key changes:

  • Replace AtomicCell with Cell class, removing Wyckoff-specific quantities
  • Move Wyckoff position data to ModelSystem.wyckoff_sites with enhanced documentation
  • Update all references and tests to use the new Cell class

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

File Description
src/nomad_simulations/schema_packages/model_system.py Core refactoring: removes AtomicCell class, adds wyckoff_sites to ModelSystem, updates method names and references
tests/*.py Updates test files to use Cell instead of AtomicCell
tests/conftest.py Updates fixture generation functions to return Cell objects
docs/model_system/model_system.md Adds comprehensive documentation for the new ModelSystem structure

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

sec_symmetry = self.m_create(Symmetry)
sec_symmetry.normalize(archive, logger)

self.wyckoff_sites = self._compute_wyckoff_sites(sec_symmetry)
Copy link

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _compute_wyckoff_sites method is called with a sec_symmetry parameter, but the method definition does not accept any parameters. This will cause a TypeError at runtime.

Copilot uses AI. Check for mistakes.
return True

@log
def _compute_wyckoff_sites(self) -> list[str]:
Copy link

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method signature indicates no parameters, but it's called with a sec_symmetry parameter on line 1357. Either add the parameter to the method signature or remove it from the call site.

Suggested change
def _compute_wyckoff_sites(self) -> list[str]:
def _compute_wyckoff_sites(self, sec_symmetry=None) -> list[str]:

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is correct

type=str,
description="""
Any verbose naming refering to the ModelSystem. Can be left empty if it is a simple
Any verbose naming referring to the `ModelSystem`. Can be left empty if it is a simple
Copy link

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The word 'referring' should be 'refering' to match the existing style, or consider using 'related to' for clarity.

Suggested change
Any verbose naming referring to the `ModelSystem`. Can be left empty if it is a simple
Any verbose naming related to the `ModelSystem`. Can be left empty if it is a simple

Copilot uses AI. Check for mistakes.
@ndaelman-hu
Copy link
Collaborator Author

One option for Cell and Symmetry is have 1 contain the other. This only requires 1 repeating subsection then, where by definition index 0 refers to the ModelSystem as described.

@ndaelman-hu
Copy link
Collaborator Author

@ladinesa Feel free to comment on the use of the logs decorator and whether I can be more aggressive in removing logging messages.

@ladinesa
Copy link
Collaborator

@ladinesa Feel free to comment on the use of the logs decorator and whether I can be more aggressive in removing logging messages.

maybe do it in a separate pr.

@ndaelman-hu
Copy link
Collaborator Author

Main modifications:

* elaborate on Wyckoff description, explain what it represents (defined wrt conventional cell) and its limits (geometry specific).

* extend Wyckoff to include site multiplicity.

* relax unique site definition to include computational settings. Remove the matching MatID descriptor.

* move Wyckoff to `Modelsystem` to mimic positions (which annotates). Alternative location would be `Symmetry`, which makes semantically more sense. The issue is how to distinguish between multiple `Symmetry` subsections.

* start up `ModelSystem` docs to keep track of design choices.

Open questions:

* should we also provide the site symmetry operators with Wyckoff. These symmetry operators then need a definition.

* ratify `cell` and `symmetry`. `Symmetry` seems set up to further define `Cell`, but the connection is indirect (reference at `Symmetry.cell_ref`). It also isn't clear which `ModelSystem.cell` defines the current positions, or how to rapidly retrieve the original cell (atm requires scanning).

From the sprint meeting, we decided to have a single cell define the model_system representation. Likewise, there will only be one CyrstalGlobalSymmetry, defined wrt to the standard conventional cell.
@lauri-codes is it fine that for the conventional cell representation we keep leaning on the topology normalizer and the viewer?

Copy link
Collaborator

@JFRudzinski JFRudzinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndaelman-hu Unfortunately I don't have time for an in-depth review, but at a glance it looks good to me, and we discussed quite a bit about it, so I feel like we are on the same page.

The docs I completely did not read, but I will do a review anyway at a later time. So, from my side this is fine to merge, I hope someone else can give you a more detailed review.

One question: should we make Cell() non-repeating, or was there still some debate about this?

@ndaelman-hu
Copy link
Collaborator Author

@ndaelman-hu Unfortunately I don't have time for an in-depth review, but at a glance it looks good to me, and we discussed quite a bit about it, so I feel like we are on the same page.

The docs I completely did not read, but I will do a review anyway at a later time. So, from my side this is fine to merge, I hope someone else can give you a more detailed review.

Thx, I'll give the other reviewers some more time.

One question: should we make Cell() non-repeating, or was there still some debate about this?

Yes, it is addressed in issue #240. Will soon have a PR up there too.

Comment on lines +67 to +75
2. **Primitive Cell** (`type='primitive'`)
- Smallest possible unit cell that tiles the crystal
- Contains minimum number of lattice points
- Used for certain calculations (e.g., band structures)

3. **Conventional Cell** (`type='conventional'`)
- Standardized cell following International Tables conventions
- Used for symmetry analysis and crystallographic databases
- **Typically used to define symmetry operators:** ["x,y,z", "-x,y,-z", "x+1/2,y+1/2,z", "-x+1/2,y+1/2,-z"] (conventional) vs ["x,y,z", "-x,-y,z", "-x,y,-z", "x,-y,-z"] (primitive).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be maybe noted here that primitive and conventional systems may not directly correspond to the original cell: they will have different densities and "idealized" lattice parameters to match space group that they were matched with certain precision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this as a clarification to the description.

they will have different densities

Not sure what you mean with densities here: mass, electronic? The former stays the same, the latter changes exactly like the geometry. However, k-samplings should change too, and this is not acknowledged (or addressed) anywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ow, upon re-read, i think I understand it now. You're referring to deviations in the cell volume due to the symmetrization accuracy constraint.

Comment on lines +187 to +203
### Wyckoff Site Annotations

`ModelSystem` provides Wyckoff site information through the `wyckoff_sites` quantity:

```python
model_system.wyckoff_sites = ['a4', 'b2', 'c1'] # Format: letter + multiplicity
```

**Key features:**
- Format: <letter><multiplicity> (e.g., 'a1', 'b2')
- Crystallographic standard: Based on International Tables for Crystallography
- Geometric classification: Determined by atomic positions only (chemical species ignored)
- Per-atom assignment: Each entry corresponds to the position at the same index

Important: Wyckoff assignments are computed relative to the conventional cell representation, ensuring database compatibility and literature consistency.

For complete structural uniqueness, combine Wyckoff information with chemical composition data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that wyckoff sites would have their own model that holds the letter, the multiplicity + any wyckoff parameters.

Also worth noting that also the primitive and conventional cells might need their own Wyckoff sites if we e.g. want to visualize them. But maybe this is not necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're suggesting splitting the format here? Is this for convention reasons, or maybe querying?

With "Wyckoff parameters", do you mean of the analysis itself, or the symmetry operations at each site?
The representation of these operations is indeed cell-dependent. The locations should simply match the geometric transformations.

Copy link

@lauri-codes lauri-codes Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With wyckoff parameters, I mean that some Wyckoff positions are parametrized with up to three parameters (x, y, z). Now whether it is important to know these parameters or not is another question. MatID can give out this information if you explicitly ask, see and example here: https://nomad-coe.github.io/matid/docs/learn/symmetry-analysis#wyckoff-sets.

If you think that just having the letter+multiplicity is enough, then having a list of strings should be fine. But this structure cannot be easily extended later, unlike a list of objects (list of objects does have other problems, like the fact that in ES you require nested queries to combine information from several fields).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main application of Wyckoff would be in visualization, yes.
I can only imagine somebody looking for an element (or with a slab, an adsorbate) being present at a specific site. This would be a very sophisticated query, though. I think users would have to be taught about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the queries, I'll follow your suggestion up: I think we would want to support queries like

  • find me materials where the representative position ['1/4', '0', '1/2'] that has multiplicity 6.
  • find me materials (of symmetry ...) where Fe occupies a b8 site. This might be more complex under our current setup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find me materials (of symmetry ...) where Fe occupies a b8 site. This might be more complex under our current setup.

To my understanding, this kind of query would have to be nested in ES. That is to say, we need an object like [{element: Xx, wyckoff_letter: x, wyckoff_number: x, ...}, ...], similar to the proposal above. A flat array mirroring positions as I had originally, is likely nicer for visualization: no need to scan and flatten the object. Considering compatibility with the current schema:

  • high: the flat object and global symmetry descriptors.
  • low: Wyckoff objects less so. They'd need particle data for the nested query, but they are only appropriate for crystalline system. An intermediate solution would be to have the particle_indices twice, but we've been trying to avoid double data, as it tends to confuse users.

If we want allow for nested queries in data (and that's a big IF), it might be nicer to have a specialized section, e.g. data/query, where we bundle any relevant data to power nested queries. This would clearly denote that this section solves a purely technical problem, as well as suggest a potential (more sophisticated) queries. The core schema can meanwhile grow in a more intuitive way for scientists and domain experts.

This is were my discussion with @JFRudzinski arrived at. @lauri-codes We would really appreciate your opinion here! The suggestion here could act as a template for future cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the summary @ndaelman-hu

I will just clarify that data/query or results/query are equally viable, or perhaps even a different structure, this is a matter of design to be dictated by you @lauri-codes

@ndaelman-hu
Copy link
Collaborator Author

@lauri-codes There was some confusion about whether the Wyckoff analysis only considers the positions, or also accounts for the different atomic species at these positions. Can you confirm that it is in fact the latter?

While including elements into the analysis is very sensible, it raises the question whether we should also consider species with different computational settings? There are cases where these settings still represent the same element, or those that were the settings model a new physical reality, e.g. core-hole excitations.

We could also run multiple Wyckoff analyses with each possible setting (positions only, chemical elements, computational settings).

@lauri-codes
Copy link

lauri-codes commented Aug 29, 2025

Yes, Wyckoff analysis (and symmetry analysis in general) does take the species identity into account. You can yourself explicitly remove the species information of course before the analysis, but this completely changes the symmetries (e.g. rock salt becomes simple cubic).

I guess it is difficult to draw the line when a certain particle has been altered in it's behaviour so much that it has become a different "species". I would go with some pragmatic approach here that does not completely break the searchability of data based on symmetry.

@ndaelman-hu
Copy link
Collaborator Author

  1. Either we make a shortlist of physically relevant distinctions. Should just see whether we can pass non-existing element labels to MatID.
  2. We produce all 3 versions: geometry, elements, computation. This is the most comprehensive approach, but also 3x as expensive.
  3. We keep the original approach of just elements.

If feasible, I would go with option 1.

@ndaelman-hu ndaelman-hu removed the request for review from EBB2675 November 7, 2025 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants