Skip to content

Add emmet-archival namespace package, part 1#1190

Merged
tsmathis merged 104 commits intomaterialsproject:mainfrom
esoteric-ephemera:archive
Aug 18, 2025
Merged

Add emmet-archival namespace package, part 1#1190
tsmathis merged 104 commits intomaterialsproject:mainfrom
esoteric-ephemera:archive

Conversation

@esoteric-ephemera
Copy link
Collaborator

@esoteric-ephemera esoteric-ephemera commented Feb 14, 2025

Summary

Adds an archival namespace package to emmet to establish MP's data archival formats moving forward. This package should be flexible enough to permit new data formats as they become relevant (e.g., data from phonopy, LOBSTER, etc.)

  • Base Archiver class that defines a schema for how data is to be archived / retrieved from HDF5 + zarr files
  • VASP RawArchive class:
    • Stores only relevant VASP files as string / bytes archives in hierarchical format
    • Automatically strips copyright protected info from VASP files
    • Constructs TaskDoc from HDF5/zarr data
  • VASP VolumetricArchive + separate volumetric data classes: formats for converting CHGCAR-structured files, bandstructures, and DOS to HDF5/zarr/parquet

Changes to emmet-core

  • Remove old validation logic and replace with pymatgen-io-validation
  • Add band_theory module for schemas of generic densities of states and band structures (electronic, phonon, etc.)

To Do's:

  • Tests tests tests
  • Test formats on a subset of VASP data on S3 to estimate storage size, partial retrieval, etc.

tschaume and others added 25 commits March 31, 2025 16:09
* update dependencies for emmet-api (ubuntu-latest/py3.10)

* update dependencies for emmet-api (ubuntu-latest/py3.11)

* update dependencies for emmet-builders (ubuntu-latest/py3.10)

* update dependencies for emmet-builders (ubuntu-latest/py3.11)

* update dependencies for emmet-core (ubuntu-latest/py3.10)

* update dependencies for emmet-core (ubuntu-latest/py3.11)

---------

Co-authored-by: github-actions <github-actions@github.com>
overlooked in pydantic v2 transition
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5.1.2 to 5.3.1.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v5.1.2...v5.3.1)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
* update dependencies for emmet-api (ubuntu-latest/py3.10)

* update dependencies for emmet-api (ubuntu-latest/py3.11)

* update dependencies for emmet-builders (ubuntu-latest/py3.10)

* update dependencies for emmet-builders (ubuntu-latest/py3.11)

* update dependencies for emmet-core (ubuntu-latest/py3.10)

* update dependencies for emmet-core (ubuntu-latest/py3.11)

---------

Co-authored-by: github-actions <github-actions@github.com>
* update dependencies for emmet-api (ubuntu-latest/py3.10)

* update dependencies for emmet-api (ubuntu-latest/py3.11)

* update dependencies for emmet-builders (ubuntu-latest/py3.10)

* update dependencies for emmet-builders (ubuntu-latest/py3.11)

* update dependencies for emmet-core (ubuntu-latest/py3.10)

* update dependencies for emmet-core (ubuntu-latest/py3.11)

---------

Co-authored-by: github-actions <github-actions@github.com>
* update dependencies for emmet-api (ubuntu-latest/py3.10)

* update dependencies for emmet-api (ubuntu-latest/py3.11)

* update dependencies for emmet-builders (ubuntu-latest/py3.10)

* update dependencies for emmet-builders (ubuntu-latest/py3.11)

* update dependencies for emmet-core (ubuntu-latest/py3.10)

* update dependencies for emmet-core (ubuntu-latest/py3.11)

---------

Co-authored-by: github-actions <github-actions@github.com>
@esoteric-ephemera esoteric-ephemera marked this pull request as ready for review August 11, 2025 23:50
@esoteric-ephemera esoteric-ephemera changed the title [WIP] Add emmet-archival namespace package Add emmet-archival namespace package, part 1 Aug 11, 2025
@esoteric-ephemera
Copy link
Collaborator Author

@tschaume to keep other PRs moving (like Karlo's new CLI) and to ensure we're making more easily-reviewed changes to new namespaces, going to checkpoint the archival package here. More updates will go into a separate PR

Most of the changes to emmet-core are additive. The only thing you might want to check @tsm is that the phonon bandstructure and DOS schemas still are OK with OpenData. These just inherit from a new base class which schematizes generic bandstructure and DOS objects

@codecov-commenter
Copy link

codecov-commenter commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 78.28627% with 223 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.92%. Comparing base (6b427f9) to head (66317b2).

Files with missing lines Patch % Lines
emmet-core/emmet/core/vasp/validation_legacy.py 58.11% 80 Missing ⚠️
emmet-archival/emmet/archival/atoms.py 68.83% 48 Missing ⚠️
emmet-archival/emmet/archival/vasp/volumetric.py 52.56% 37 Missing ⚠️
emmet-archival/emmet/archival/volumetric.py 69.62% 24 Missing ⚠️
emmet-archival/emmet/archival/base.py 85.71% 11 Missing ⚠️
emmet-core/emmet/core/band_theory.py 94.80% 8 Missing ⚠️
emmet-archival/emmet/archival/vasp/raw.py 94.85% 7 Missing ⚠️
emmet-core/emmet/core/vasp/utils.py 93.87% 3 Missing ⚠️
emmet-core/emmet/core/tasks.py 71.42% 2 Missing ⚠️
emmet-core/emmet/core/utils.py 92.00% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1190      +/-   ##
==========================================
- Coverage   89.45%   88.92%   -0.54%     
==========================================
  Files         151      161      +10     
  Lines       15756    16545     +789     
==========================================
+ Hits        14095    14712     +617     
- Misses       1661     1833     +172     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@tschaume tschaume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, great work! I only have nit-picky comments to go through.

@tsmathis tsmathis merged commit 199cd05 into materialsproject:main Aug 18, 2025
10 checks passed
@esoteric-ephemera esoteric-ephemera deleted the archive branch November 3, 2025 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants