New trajectory class with parquet rountrip functionality #1206

esoteric-ephemera · 2025-03-12T00:01:24Z

In support of removing ionic step / calcs reversed info from MP's mongo task collection, this adds a Trajectory class which interfaces with pyarrow/parquet, pymatgen's Trajectory, and ASE's Trajectory. Verified that parquet rountrip works perfectly (model-dumped hashes of emmet Trajectory objects are identical before and after parquet conversion).

Since the site still needs energy convergence info, using parquet lets us partially retrieve the energy data from the trajectory

This is a middle-ground solution until the emmet-archival PR is ready

codecov-commenter · 2025-03-12T00:05:36Z

Codecov Report

Attention: Patch coverage is 81.81818% with 28 lines in your changes missing coverage. Please review.

Project coverage is 90.06%. Comparing base (ab1e34a) to head (609fd24).
Report is 18 commits behind head on main.

Files with missing lines	Patch %	Lines
emmet-core/emmet/core/trajectory.py	81.81%	28 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1206      +/-   ##
==========================================
- Coverage   90.15%   90.06%   -0.09%     
==========================================
  Files         147      148       +1     
  Lines       14509    14663     +154     
==========================================
+ Hits        13080    13206     +126     
- Misses       1429     1457      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

esoteric-ephemera · 2025-03-12T00:37:04Z

@tschaume @yang-ruoxi : should be ready to test with the trajectories endpoint, roundtrip is working fine. Only additions I might want to make are supporting site properties, but I don't think we have any in the ionic steps (e.g., selective dynamics, magmoms, and velocities tags)

And @tsmathis when you have time : any comments about the parquet serialization are appreciated - this is a very specific implementation of parquet serialization for an emmet object

tsmathis · 2025-03-14T00:25:37Z

re: arrow + parquet writing, is the long term intention to keep writing individual trajectory objects to individual parquet files?

esoteric-ephemera · 2025-03-14T00:38:27Z

No this is a short-term solution: To build a performant index for the task collection, removing the ionic steps / calcs reversed helps a lot (reduces the task collection size by half). We still want to serve that info up, and need the total energy by ionic step for the convergence graph in the website task view

Serving up individual parquet files permits partial retrieval of energy data by task_id, and also lets users retrieve full trajectory info

tsmathis · 2025-03-14T00:39:42Z

emmet-core/emmet/core/trajectory.py

+        pa_table = pa.table(pa_config)
+        if file_name:
+            with zopen(str(file_name), "wb") as f:
+                pa_pq.write_table(pa_table, f)


Correct me if I'm wrong, but the compression formats that are parquet compatible (‘SNAPPY’, ‘GZIP’, ‘BROTLI’, ‘LZ4’, ‘ZSTD’) don't mesh with zopen's formats.

I see in the test files there is a .gz extension for the test parquet file, I'm guessing this is the only format that would work with zopen?

I would opt towards dropping monty here and sticking with pyarrow.parquet's read/write behavior and support all the compression types that are parquet compatible. And have the default be the default compression format for write_table, i.e., snappy: pyarrow.parquet.write_table.

Yeah monty doesn't support these - switched to default pyarrow compression, thanks for the suggestion!

tsmathis · 2025-03-14T00:56:36Z

No this is a short-term solution: To build a performant index for the task collection, removing the ionic steps / calcs reversed helps a lot (reduces the task collection size by half). We still want to serve that info up, and need the total energy by ionic step for the convergence graph in the website task view

Serving up individual parquet files permits partial retrieval of energy data by task_id, and also lets users retrieve full trajectory info

Okay, fine as is for now then if this will get us where we need to be.

tschaume · 2025-03-19T23:28:14Z

Thank you @esoteric-ephemera! Once we get around to trying to put the new Task API into production, I'll have to double-check whether it's more performant to put the data needed for the convergence graph back into the collection or retrieve them through parquet files (preferably the same @tsmathis is using for the builders).

…ectory standards

…s bandgaps, add approx to tests

esoteric-ephemera · 2025-03-28T16:32:58Z

@tschaume, when the requirements_*.txt files are auto-generated, it looks like the package itself isn't being excluded. Causes problems with CI if version pins change / probably want to ensure that only the current version is tested in CI

tschaume · 2025-03-28T17:44:20Z

@esoteric-ephemera This comes down to this line in the workflow file. I can't remember why I decided to only install an editable version of emmet-core for testing the other emmet namespace packages. We could change that line to matrix.package == "emmet-core" or maybe better remove the line entirely so that always the latest version is tested.

tschaume · 2025-03-28T17:46:27Z

PS: actually I remember :) The next step in the action should install the current emmet-core over any version defined in the requirements file.

esoteric-ephemera · 2025-04-01T16:01:08Z

OK makes sense, but for releasing upper version pins on dependencies, this can cause CI issues

tschaume · 2025-04-01T19:11:09Z

If the upper version pin is on dependencies other than emmet-*, you'd update the setup.py in your branch, run the upgrade dependencies action from your branch which will create an upgrade-dependencies branch (and PR) that contains the new set of dependencies, and then merge that into your branch. Hope that makes sense.

This comment was marked as resolved.

Sign in to view

tsmathis reviewed Mar 14, 2025

View reviewed changes

esoteric-ephemera added 10 commits March 26, 2025 11:34

Add trajectory class to adapt between parquet, pymatgen, and ase traj…

5704a1b

…ectory standards

remove original POTCAR from tests

d0d58bf

add tests for trajectory including round trip

efecbce

precommit + docstr

5b17e68

mypy fixes

50548b3

add pyarrow to extras

9585150

revise to use default pyarrow compression

8603a26

Add identifier / extraction by ID to trajectory

7829952

ensure identifier is only set if not null

8a56542

mypy fixes

d522ea4

esoteric-ephemera force-pushed the parqtraj branch from 84b956c to d522ea4 Compare March 26, 2025 18:34

esoteric-ephemera added 2 commits March 27, 2025 17:35

see if releasing pymatgen pin breaks ci for electronic structure / do…

58c5b0d

…s bandgaps, add approx to tests

missing unpin in setup

c371eb3

remove emmet-core from requirements

8ed147a

release pmg upper pin

609fd24

tsmathis merged commit 8cbc716 into materialsproject:main Apr 1, 2025
8 checks passed

This was referenced Apr 1, 2025

Minor update to make FermiDos more robust materialsproject/pymatgen#4240

Merged

[Feature Request]: Better handling of parsed trajectory in VASP calculations #872

Open

esoteric-ephemera deleted the parqtraj branch April 11, 2025 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New trajectory class with parquet rountrip functionality #1206

New trajectory class with parquet rountrip functionality #1206

Uh oh!

esoteric-ephemera commented Mar 12, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 12, 2025 •

edited

Loading

Uh oh!

esoteric-ephemera commented Mar 12, 2025 •

edited by tschaume

Loading

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

tsmathis commented Mar 14, 2025

Uh oh!

esoteric-ephemera commented Mar 14, 2025

Uh oh!

tsmathis Mar 14, 2025 •

edited

Loading

Uh oh!

esoteric-ephemera Mar 14, 2025

Uh oh!

tsmathis commented Mar 14, 2025

Uh oh!

tschaume commented Mar 19, 2025

Uh oh!

esoteric-ephemera commented Mar 28, 2025

Uh oh!

tschaume commented Mar 28, 2025

Uh oh!

tschaume commented Mar 28, 2025

Uh oh!

esoteric-ephemera commented Apr 1, 2025

Uh oh!

tschaume commented Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

New trajectory class with parquet rountrip functionality #1206

New trajectory class with parquet rountrip functionality #1206

Uh oh!

Conversation

esoteric-ephemera commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

esoteric-ephemera commented Mar 12, 2025 • edited by tschaume Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

tsmathis commented Mar 14, 2025

Uh oh!

esoteric-ephemera commented Mar 14, 2025

Uh oh!

tsmathis Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esoteric-ephemera Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

tsmathis commented Mar 14, 2025

Uh oh!

tschaume commented Mar 19, 2025

Uh oh!

esoteric-ephemera commented Mar 28, 2025

Uh oh!

tschaume commented Mar 28, 2025

Uh oh!

tschaume commented Mar 28, 2025

Uh oh!

esoteric-ephemera commented Apr 1, 2025

Uh oh!

tschaume commented Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

esoteric-ephemera commented Mar 12, 2025 •

edited

Loading

codecov-commenter commented Mar 12, 2025 •

edited

Loading

esoteric-ephemera commented Mar 12, 2025 •

edited by tschaume

Loading

tsmathis Mar 14, 2025 •

edited

Loading