Merge `vssr_pourbaix` #34

xiaochendu · 2025-04-23T14:01:36Z

Additional and improved methods for fine-tuning and evaluating/plotting NFF

- change paths to models in `chgnet` dir - remove `chgnet` models in `NeuralForceField`

…rence_std`

…ompatibility

…plotting

…utional layer unfreezing in training script

…ion layer unfreezing in MACE models

… MAE formatting

…in MACE model training - able to specify whether to fix pooling in args

…ch size in evaluation script

…tter visualization

…r in ChgnetLayerFreezer

Copilot

Pull Request Overview

This pull request “Merge vssr_pourbaix” introduces additional fine‐tuning options and improvements for evaluation/plotting in the NFF framework. Key changes include extended command‐line arguments for training and evaluation, revised freezing/unfreezing logic in the transfer learning utilities, and updated dependency/configuration settings.

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
scripts/train_nff.py	Added new CLI arguments for fine-tuning, including custom layers and unfreezing options; updated model loading logic using to_tensor.
scripts/evaluate_nff.py	Introduced new plotting options (plot_type, batch_size, per_atom_energy) and adjusted test loader settings.
pyproject.toml	Updated the ASE dependency version from 3.22.1 to 3.23.0.
nff/utils/cuda.py	Wrapped device selection in a try/except to gracefully handle potential NVIDIA SMI errors.
nff/train/transfer.py	Integrated debug print statements in the transfer learning functions and modified unfreezing functions in MaceLayerFreezer and ChgnetLayerFreezer.
nff/nn/models/chgnet.py	Updated file paths for pretrained model checkpoints and tweaked module imports.
nff/io/chgnet.py	Added helper functions for converting CHGNet structure targets and expanded support for structure data.
nff/io/ase_calcs.py	Added a TODO comment for updating atoms only when necessary.
nff/io/ase.py	Enhanced AtomsBatch by deep copying arrays and constraints upon initialization and copying.
nff/data/stats.py	Reformatted standard deviation and reference mean calculations for outlier removal.
nff/data/dataset.py	Added condition for splitting when there is no validation set.
nff/analysis/parity_plot.py	Adjusted figure size and changed saving format from PNG to PDF, as well as rasterized plotting elements.
nff/analysis/mpl_settings.py	Updated several Matplotlib settings including DPI, font sizes, and line widths.
nff/analysis/loss_plot.py	Reduced figure size to (5, 2.5) in the loss plotting routine.
models/foundation_models/chgnet/0.3.0/README.md	Removed legacy README to streamline documentation for the 0.3.0 model.
models/foundation_models/chgnet/0.2.0/README.md	Removed legacy README to streamline documentation for the 0.2.0 model.

scripts/train_nff.py

Copilot · 2025-04-23T14:02:33Z

nff/train/transfer.py

        Function to transfer learn a model. Defined in the subclasses.
        """
+        pass


The 'model_tl' function is now implemented as 'pass'; please confirm that this omission is intentional and that proper transfer learning logic is provided elsewhere or will be implemented later.

Suggested change

Function to transfer learn a model. Defined in the subclasses.

"""

pass

Function to transfer learn a model. This method must be implemented

by subclasses to define specific transfer learning logic.

Args:

model (torch.nn.Module): model to be transfer learned

freeze_gap_embedding (bool): whether to freeze gap embedding layers

freeze_pooling (bool): whether to freeze pooling layers

freeze_skip (bool): whether to freeze skip connections

custom_layers (List[str]): list of layers to unfreeze specified by the user

**kwargs: additional arguments for transfer learning

"""

raise NotImplementedError(

"The 'model_tl' method must be implemented by subclasses of LayerFreezer."

)

ajhoffman1229

I think most of the code looks good, there are a few places where it looks like there are some duplicate lines though.

ajhoffman1229 · 2025-04-23T20:43:53Z

nff/analysis/loss_plot.py

    ax_fig[0].set_xlabel("Epoch")
    ax_fig[0].set_ylabel("Loss")
+    ax_fig[0].set_xlabel("Epoch")
+    ax_fig[0].set_ylabel("Loss")


Are these lines repeated for a reason?

ajhoffman1229 · 2025-04-23T20:44:07Z

nff/analysis/loss_plot.py

    ax_fig[1].set_xlabel("Epoch")
    ax_fig[1].set_ylabel("Loss")
+    ax_fig[1].set_xlabel("Epoch")
+    ax_fig[1].set_ylabel("Loss")


These lines also seem redundant.

ajhoffman1229 · 2025-04-23T20:46:22Z

nff/analysis/mpl_settings.py

+    """Converts hex to rgb colors.
+
+    Args:
+        value (str): string of 6 characters representing a hex colour.


Your US education is clashing with the British English that (I presume?) is taught in Singapore 😂
(Just to be clear, no fix is needed here)

Whoops, I think what happened was I copied and pasted the arg description from somewhere else while the top line was ChatGPT generated. XD

ajhoffman1229 · 2025-04-23T20:48:37Z

nff/analysis/parity_plot.py

+    kernel = gaussian_kde(
+        np.vstack([x.sample(n=len(x), random_state=2), y.sample(n=len(y), random_state=2)])
+    )
+    kernel = gaussian_kde(
+        np.vstack([x.sample(n=len(x), random_state=2), y.sample(n=len(y), random_state=2)])
+    )


These lines appear to be repeated

ajhoffman1229 · 2025-04-23T20:51:15Z

nff/data/stats.py

-    mean = reference_mean if reference_mean else np.mean(stats_array)
-    std = reference_std if reference_std else np.std(stats_array)
+    if reference_mean is None:
+        mean = np.mean(stats_array)
+    else:
+        mean = reference_mean
+    if reference_std is None:
+        std = np.std(stats_array)
+    else:
+        std = reference_std


Does this modification do anything different? I feel like the code that this update replaces should function the same as this new code but is more succinct. None values should be falsy.

I must have made an error when merging. I wanted to take the incoming (master) rather than current (vssr_pourbaix).

ajhoffman1229 · 2025-04-23T20:53:35Z

nff/io/ase_calcs.py

+        # TODO: update atoms only when necessary
+        atoms.update_nbr_list(update_atoms=True)


Could you provide some additional clarity about the TODO here? Is this issue more persistent/affecting performance significantly enough that it merits its own issue on GitHub? If so, we might want to open one.

This is used to update the number of atoms between MCMC steps. It might not be necessary for the general user running MD. Let me remove it for the main branch.

ajhoffman1229 · 2025-04-23T20:56:26Z

nff/train/transfer.py

        for i, block in enumerate(model.readouts):
-            if unfreeze_skip or i == num_readouts - 1:
+            if unfreeze_skip:
+                self.unfreeze_parameters(block)
+            elif i == num_readouts - 1:
                self.unfreeze_parameters(block)


Is there a reason this was split into two if/elif statements that do the same thing?

ajhoffman1229 · 2025-04-23T20:57:41Z

nff/utils/cuda.py

+            return f"cuda:{cuda_devices_sorted_by_free_mem()[-1]}"
+        except nvidia_smi.NVMLError:
+            return "cuda:0"
        return f"cuda:{cuda_devices_sorted_by_free_mem()[-1]}"


With the above try/except statement, should this return line be removed?

you're right!

Co-authored-by: Copilot <[email protected]>

ajhoffman1229

Looks great!

HojeChun

Looks great to me just had one suggestion.

HojeChun · 2025-04-28T16:52:10Z

nff/io/chgnet.py

 def convert_data_batch(
    data_batch: Dict,
    cutoff: float = 5.0,
    shuffle: bool = True,


Can we set shuffle to be False. I assume shuffle has been done when you make dataloader, and this is a function for wrapper.

It's a dummy variable but made the change as you suggested to make it less confusing (I guess)!

…f` to False

xiaochendu added 24 commits July 18, 2024 13:22

BLD: update ase version

31e70e6

BUG: change pretrained CHGNet paths

06be0d4

- change paths to models in `chgnet` dir - remove `chgnet` models in `NeuralForceField`

MAINT: update copy AtomsBatch to copy arrays and constraints

99b8fad

MAINT: NeuralFF update neighbors in calculate

3681bcf

BUG: chgnet latest version fix import

45f9f8a

MAINT: update AtomsBatch.from_atoms with arrays and constraints copy

73aa4ec

BUG & MAINT: fix lr_decay and set default model units

cadbc65

MAINT: add conversion functions for CHGNet structure data to NFF format

b020bb5

MAINT: allow for no validation set and fix reference_mean and `refe…

b5dfbfd

…rence_std`

BUG: handle NVMLError when retrieving the final CUDA device

dc50ffd

MAINT: convert tensors to numpy arrays in parity plot functions for c…

714f67a

…ompatibility

FEAT: add plot type argument to evaluate_nff script for customizable …

c534593

…plotting

FEAT: add support for custom layer unfreezing and configurable convol…

28b5e0c

…utional layer unfreezing in training script

FEAT: add option to unfreeze embeddings in CHGNet model training

4022cec

ENH: add option to unfreeze node embedding layer and improve interact…

4b51145

…ion layer unfreezing in MACE models

MAINT: update parity plot function to save figures as PDF and improve…

e34fde3

… MAE formatting

ENH: add options to unfreeze radial embedding and interaction layers …

5deb72c

…in MACE model training - able to specify whether to fix pooling in args

ENH: add options for per atom energy calculation and configurable bat…

f935839

…ch size in evaluation script

MAINT: formatting

7d018fc

ENH: improve plotting functions and update Matplotlib settings for be…

6013d7d

…tter visualization

MAINT: remove commented-out method for unfreezing last atom conv laye…

2573e68

…r in ChgnetLayerFreezer

ENH: update ase dependency to version 3.23.0

05cc9fe

Merge branch 'surface-sampling-0.2.0' into vssr_pourbaix

74501cf

Merge branch 'master' into vssr_pourbaix

df42616

xiaochendu requested review from HojeChun, ajhoffman1229, Copilot and pafervi April 23, 2025 14:01

Copilot AI reviewed Apr 23, 2025

View reviewed changes

pafervi previously approved these changes Apr 23, 2025

View reviewed changes

ajhoffman1229 reviewed Apr 23, 2025

View reviewed changes

Apply suggestions from code review

5fe46a2

Co-authored-by: Copilot <[email protected]>

xiaochendu dismissed pafervi’s stale review via 5fe46a2 April 23, 2025 22:22

MAINT & STY: update based on PR #34 comments and style fixes

31fbcfc

ajhoffman1229 previously approved these changes Apr 28, 2025

View reviewed changes

HojeChun previously approved these changes Apr 28, 2025

View reviewed changes

HojeChun reviewed Apr 28, 2025

View reviewed changes

MAINT: update shuffle default in `convert_chgnet_structure_data_to_nf…

95937c9

…f` to False

xiaochendu dismissed stale reviews from HojeChun and ajhoffman1229 via 95937c9 May 1, 2025 01:41

HojeChun approved these changes May 1, 2025

View reviewed changes

HojeChun merged commit 08d142f into master May 1, 2025
1 check passed

-        Function to transfer learn a model. Defined in the subclasses.
-        """
-        pass
+        Function to transfer learn a model. This method must be implemented
+        by subclasses to define specific transfer learning logic.
+        Args:
+            model (torch.nn.Module): model to be transfer learned
+            freeze_gap_embedding (bool): whether to freeze gap embedding layers
+            freeze_pooling (bool): whether to freeze pooling layers
+            freeze_skip (bool): whether to freeze skip connections
+            custom_layers (List[str]): list of layers to unfreeze specified by the user
+            **kwargs: additional arguments for transfer learning
+        """
+        raise NotImplementedError(
+            "The 'model_tl' method must be implemented by subclasses of LayerFreezer."
+        )

		# TODO: update atoms only when necessary
		atoms.update_nbr_list(update_atoms=True)

Merge vssr_pourbaix #34

Merge vssr_pourbaix #34

Uh oh!

Conversation

xiaochendu commented Apr 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

ajhoffman1229 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiaochendu Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajhoffman1229 left a comment

Choose a reason for hiding this comment

Uh oh!

HojeChun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Merge `vssr_pourbaix` #34

Merge `vssr_pourbaix` #34

xiaochendu Apr 23, 2025 •

edited

Loading