-
Notifications
You must be signed in to change notification settings - Fork 584
feat(pt): add plugin for data modifier #4661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds a data-modifier subsystem (BaseModifier + factory), integrates modifiers into DeepmdData/datasets/loaders with optional caching and preload, propagates modifiers through training, inference, freezing (.pth extra file), updates ModelWrapper to apply modifiers at inference, and adds unit tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor CLI as Entrypoint
participant Factory as get_data_modifier
participant Loader as DpLoaderSet / Dataset
participant Data as DeepmdData
participant Trainer as ModelWrapper
participant Freezer as Script Export (.pth)
participant Eval as DeepEval
CLI->>Factory: request modifier from model params
Factory-->>CLI: BaseModifier instance (jitable?)
CLI->>Loader: construct loaders with modifier
Loader->>Data: attach modifier to dataset
Loader->>Data: preload_and_modify_all_data_torch(num_worker)
Data->>Data: apply modifier per-frame (threaded) and cache results
CLI->>Trainer: instantiate ModelWrapper with modifier
Trainer->>Trainer: compute model_pred
Trainer->>Trainer: compute modifier_pred and combine outputs
CLI->>Freezer: freeze -> serialize model + extra_files["data_modifier.pth"]
Eval->>Freezer: load .pth (with extra_files)
Freezer-->>Eval: provide `data_modifier.pth` bytes
Eval->>Factory: torch.jit.load(bytes) -> modifier instance
Eval->>Trainer: instantiate ModelWrapper with loaded modifier
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/pt/modifier/base_modifier.py (2)
41-44: Consider simplifying the box assignment.
A ternary operator can reduce verbosity here:- if data["box"] is None: - box = None - else: - box = data["box"][:get_nframes, :] + box = None if data["box"] is None else data["box"][:get_nframes, :]🧰 Tools
🪛 Ruff (0.8.2)
41-44: Use ternary operator
box = None if data["box"] is None else data["box"][:get_nframes, :]instead ofif-else-blockReplace
if-else-block withbox = None if data["box"] is None else data["box"][:get_nframes, :](SIM108)
47-47: Remove or use thenframesvariable.
Currently,nframes = coord.shape[0]is not used, which may confuse future maintainers.🧰 Tools
🪛 Ruff (0.8.2)
47-47: Local variable
nframesis assigned to but never usedRemove assignment to unused variable
nframes(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/pt/modifier/__init__.py(1 hunks)deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/pt/train/training.py(7 hunks)deepmd/pt/utils/stat.py(1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
deepmd/pt/modifier/__init__.py (1)
deepmd/pt/modifier/base_modifier.py (1) (1)
BaseModifier(9-56)
deepmd/pt/train/training.py (3)
deepmd/pt/modifier/base_modifier.py (2) (2)
BaseModifier(9-56)modify_data(14-56)deepmd/pd/train/training.py (1) (1)
get_additional_data_requirement(1163-1187)deepmd/pd/utils/stat.py (1) (1)
make_stat_input(40-85)
🪛 Ruff (0.8.2)
deepmd/pt/modifier/base_modifier.py
41-44: Use ternary operator box = None if data["box"] is None else data["box"][:get_nframes, :] instead of if-else-block
Replace if-else-block with box = None if data["box"] is None else data["box"][:get_nframes, :]
(SIM108)
47-47: Local variable nframes is assigned to but never used
Remove assignment to unused variable nframes
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (29)
- GitHub Check: Test Python (6, 3.12)
- GitHub Check: Test Python (6, 3.9)
- GitHub Check: Test Python (5, 3.12)
- GitHub Check: Test Python (5, 3.9)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (4, 3.9)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Test C++ (false)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Analyze (python)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (1, 3.9)
🔇 Additional comments (11)
deepmd/pt/modifier/__init__.py (1)
1-8: Good job exposing the BaseModifier API.
This new__init__.pycleanly re-exports theBaseModifierclass and ensures users can import it directly fromdeepmd.pt.modifier.deepmd/pt/utils/stat.py (2)
50-51: Conditional logging is well-handled.
Only logging whennbatches > 0helps keep logs cleaner in scenarios where no batches are processed.
56-59: Logic for handling nbatches == -1 is clear and correct.
This new condition ensures the entire dataset is used whennbatchesis -1. No issues found.deepmd/pt/train/training.py (8)
39-41: Import of BaseModifier is appropriate.
This import makes the newly introduced functionality available where needed.
140-149: Modifier parameter handling is well-structured.
The assertion preventing usage in multi-task scenarios is clear and avoids incompatible configurations.
231-231: Defaultingmodifierto None is appropriate.
Makes the modifier usage optional without complicating the training interface.
239-250: Verify data modification logic.
Applyingmodifier.modify_datato every system might lead to repeated transformations ifsingle_model_statis called multiple times. Confirm this matches your intended workflow.
345-345: Single-model signature usage is consistent.
Passingmodifier=self.modifierensures the same modifier instance is applied throughout the training flow.
384-384: Multi-task signature usage is consistent.
Again, passingmodifier=self.modifierallows uniform data processing across tasks if needed.
1075-1081: Storing thedata_modifierstate is a good idea.
Consider providing a loading mechanism in the future so thatdata_modifiercan be restored automatically.
1389-1400: Factory function for data modifiers looks good.
Encapsulates logic for dynamically obtaining modifier classes, making the code more extensible.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #4661 +/- ##
==========================================
- Coverage 82.15% 81.94% -0.21%
==========================================
Files 709 712 +3
Lines 72468 72824 +356
Branches 3616 3615 -1
==========================================
+ Hits 59535 59679 +144
- Misses 11769 11982 +213
+ Partials 1164 1163 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
wanghan-iapcm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add UT for the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new plugin for modifying data in the PyTorch backend and integrates it into the training workflow. Key changes include:
- Creation of a new BaseModifier class and registration in the modifier package.
- Integration of the data modifier into the training process, including saving its state.
- Minor adjustments to the statistics data preparation in the utils module.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| deepmd/pt/modifier/init.py | Exposes BaseModifier for external use. |
| deepmd/pt/modifier/base_modifier.py | Adds a new BaseModifier class for data modification. |
| deepmd/pt/train/training.py | Integrates the data modifier into training data preparation and model saving. |
| deepmd/pt/utils/stat.py | Tweaks logging and batch calculation in the statistics utility. |
Comments suppressed due to low confidence (2)
deepmd/pt/modifier/base_modifier.py:9
- Ensure that make_base_modifier() returns a valid class to use for multiple inheritance with torch.nn.Module. If it does not, consider revising the inheritance structure or renaming for clarity.
class BaseModifier(torch.nn.Module, make_base_modifier()):
deepmd/pt/modifier/base_modifier.py:40
- The variable get_nframes is explicitly set to None, which will slice the full array; if a limit on the number of frames was intended, assign get_nframes an appropriate value.
coord = data["coord"][:get_nframes, :]
8d02057 to
859292f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
deepmd/pt/utils/dataset.py (1)
74-75: Remove or implement the commented-outclear_modified_frame_cachemethod.Similar to the dataloader, this commented-out method suggests incomplete implementation. Either implement the functionality if needed, or remove the comment to reduce clutter.
This is the same issue as in
deepmd/pt/utils/dataloader.pylines 245-247. Consider a unified approach to cache management across both files.deepmd/pt/modifier/base_modifier.py (1)
50-50: Remove or use the commented-outnframesvariable.The variable
nframesis calculated but immediately commented out and never used. Either remove the comment and use the variable, or delete the line entirely to reduce clutter.Based on learnings, if this is intentionally kept for future use, please clarify with a TODO comment.
🧹 Nitpick comments (2)
deepmd/pt/modifier/base_modifier.py (1)
17-17: Clarify the purpose of the unuseddata_sysparameter.The
data_sysparameter is flagged as unused by static analysis. If it's intended for future use or required by subclasses, consider adding a docstring note or a TODO comment. Otherwise, consider removing it to reduce the parameter surface.deepmd/pt/utils/dataloader.py (1)
245-247: Remove or implement the commented-outclear_modified_frame_cachemethod.The commented-out method suggests incomplete implementation. Either implement the cache-clearing functionality if it's needed, or remove the comment to keep the codebase clean.
If cache clearing is required in the future, consider opening an issue to track the feature.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/pt/entrypoints/main.py(4 hunks)deepmd/pt/modifier/__init__.py(1 hunks)deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/pt/train/training.py(2 hunks)deepmd/pt/utils/dataloader.py(4 hunks)deepmd/pt/utils/dataset.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/modifier/init.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (4)
deepmd/pt/train/training.py (3)
deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/utils/data.py (1)
preload_and_modify_all_data(498-513)
deepmd/pt/entrypoints/main.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-60)deepmd/utils/plugin.py (1)
get_class_by_type(144-154)deepmd/pt/utils/dataloader.py (1)
DpLoaderSet(65-247)
deepmd/pt/utils/dataset.py (2)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-60)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)
deepmd/pt/utils/dataloader.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-60)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/utils/data.py (1)
preload_and_modify_all_data(498-513)
🪛 Ruff (0.14.8)
deepmd/pt/entrypoints/main.py
116-116: Avoid specifying long messages outside the exception class
(TRY003)
deepmd/pt/modifier/base_modifier.py
17-17: Unused method argument: data_sys
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Analyze (python)
🔇 Additional comments (9)
deepmd/pt/entrypoints/main.py (3)
111-121: LGTM: Data modifier instantiation is well-structured.The
get_data_modifierfunction correctly validates the presence of thetypefield, instantiates the modifier using the plugin system, and moves it to the appropriate device.
129-133: LGTM: Modifier wiring integrates cleanly.The modifier is correctly extracted from model parameters and instantiated only when present, maintaining backward compatibility for configurations without modifiers.
164-180: LGTM: Modifier is correctly passed to both training and validation data loaders.The modifier is properly threaded through to both
DpLoaderSetinstances, ensuring consistent data preprocessing across training and validation.deepmd/pt/train/training.py (2)
353-355: LGTM: Preload and modify step correctly positioned.The
preload_and_modify_all_data()calls are correctly placed after data requirements are added bysingle_model_statand before data loaders are constructed. This ensures the modifier operates on the complete, configured data before training begins.
394-396: LGTM: Multi-model path mirrors single-model preload logic.The multi-model code path correctly applies
preload_and_modify_all_data()to each model's training and validation data, maintaining consistency with the single-model approach.deepmd/pt/utils/dataloader.py (2)
82-102: LGTM: Modifier parameter correctly threaded through DpLoaderSet.The
modifierparameter is properly added to the constructor signature, stored, and forwarded to eachDeepmdDataSetForLoaderinstance. The optional typing (BaseModifier | None) maintains backward compatibility.
241-243: LGTM: Preload method delegates correctly.The
preload_and_modify_all_data()method properly iterates over all systems and delegates to each system's preload method, enabling batch preprocessing before training.deepmd/pt/utils/dataset.py (2)
22-40: LGTM: Modifier integration is clean and complete.The
modifierparameter is properly added to the constructor, documented, stored as an instance attribute, and forwarded to the underlyingDeepmdDatainstance. This enables the data modification pipeline while maintaining a clear separation of concerns.
71-72: LGTM: Preload method correctly delegates to the data system.The
preload_and_modify_all_data()method properly delegates toself._data_system.preload_and_modify_all_data(), maintaining the abstraction layers and enabling efficient batch preprocessing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
deepmd/pt/modifier/base_modifier.py (2)
43-43: Clarify or remove the get_nframes pattern.Setting
get_nframes = Noneand then using it in slicing operations (e.g.,[:get_nframes, :]) is equivalent to[:None, :]which means "all frames". This pattern appears to be a placeholder for future functionality but currently adds no value and may confuse readers.If this is intentional for future extension, consider adding a comment explaining the planned usage. Otherwise, simplify to direct slicing:
- get_nframes = None - coord = data["coord"][:get_nframes, :] + coord = data["coord"]
54-59: Consider adding shape validation before reshape.The code assumes that the outputs from
forward()are compatible with the target shapes indata. If a subclass returns tensors with incompatible shapes, thereshape()operations will fail with potentially unclear error messages.Consider adding assertions to provide clearer error messages:
if "find_energy" in data and data["find_energy"] == 1.0: expected_shape = data["energy"].shape assert tot_e.numel() == np.prod(expected_shape), \ f"Energy shape mismatch: forward returned {tot_e.shape}, expected {expected_shape}" data["energy"] -= tot_e.reshape(expected_shape)
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/pt/utils/dataloader.py(4 hunks)deepmd/pt/utils/dataset.py(2 hunks)source/tests/pt/test_data_modifier.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (3)
deepmd/pt/utils/dataset.py (2)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-59)deepmd/utils/data.py (2)
DeepmdData(34-1069)preload_and_modify_all_data(498-513)
deepmd/pt/modifier/base_modifier.py (1)
deepmd/dpmodel/modifier/base_modifier.py (1)
make_base_modifier(17-77)
deepmd/pt/utils/dataloader.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(12-59)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/utils/data.py (1)
preload_and_modify_all_data(498-513)
🪛 Ruff (0.14.8)
source/tests/pt/test_data_modifier.py
81-81: Unused method argument: data_sys
(ARG002)
141-141: Unused method argument: data_sys
(ARG002)
deepmd/pt/modifier/base_modifier.py
17-17: Unused method argument: data_sys
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (9)
deepmd/pt/modifier/base_modifier.py (2)
1-15: LGTM!The imports and class structure are clean. The BaseModifier correctly inherits from torch.nn.Module and the dpmodel base class.
36-41: Good defensive check.The early return when no find_* flags are present avoids unnecessary computation.
source/tests/pt/test_data_modifier.py (3)
28-35: LGTM!The plugin registrations for test modifiers are correctly structured.
159-173: Good test setup.The test configuration properly loads a base config and customizes training parameters for testing.
175-223: Excellent test coverage.The tests effectively validate:
- That data modification is applied during initialization (zero_tester)
- That modification happens exactly once and persists through training (random_tester)
This covers the key behaviors of the modifier system.
deepmd/pt/utils/dataloader.py (2)
28-30: Clean integration of modifier support.The modifier parameter is properly typed, documented in the docstring (implicitly through the system parameter), and correctly propagated to the dataset constructor.
Also applies to: 82-102
241-243: LGTM!The
preload_and_modify_all_datamethod correctly delegates to each underlying system, providing a convenient way to preload and modify all data in the loader set.deepmd/pt/utils/dataset.py (2)
12-14: Excellent integration of modifier support.The modifier parameter is properly:
- Typed with
BaseModifier | None- Documented in the docstring
- Stored as an instance variable
- Propagated to the DeepmdData constructor
The implementation is clean and consistent with the overall modifier pattern.
Also applies to: 22-44
71-72: LGTM!The
preload_and_modify_all_datamethod correctly delegates to the underlying data system, maintaining a clean separation of concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
deepmd/pt/modifier/base_modifier.py (1)
93-107: Clarify the obscureget_nframes = Noneusage.Line 93 sets
get_nframes = Noneand then uses it in slicing operations likedata["coord"][:get_nframes, :]. Whenget_nframesisNone, the slice[:None, :]is equivalent to[:, :](selecting all frames).This pattern is intentional but not immediately obvious to readers.
Consider adding a comment to clarify:
- get_nframes = None + get_nframes = None # None in slice [:None] means select all frames coord = data["coord"][:get_nframes, :]Or use a more explicit pattern:
- get_nframes = None - coord = data["coord"][:get_nframes, :] - atype = data["atype"][:get_nframes, :] + # Process all frames in the batch + coord = data["coord"] + atype = data["atype"]
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/pt/modifier/base_modifier.py(1 hunks)deepmd/utils/data.py(3 hunks)source/tests/pt/test_data_modifier.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (1)
deepmd/utils/data.py (3)
deepmd/pt/modifier/base_modifier.py (1)
modify_data(66-117)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data(71-72)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data(241-243)
🪛 Ruff (0.14.8)
source/tests/pt/test_data_modifier.py
52-52: Unused method argument: atype
(ARG002)
53-53: Unused method argument: box
(ARG002)
54-54: Unused method argument: fparam
(ARG002)
55-55: Unused method argument: aparam
(ARG002)
59-59: Unused method argument: data_sys
(ARG002)
89-89: Unused method argument: atype
(ARG002)
90-90: Unused method argument: box
(ARG002)
91-91: Unused method argument: fparam
(ARG002)
92-92: Unused method argument: aparam
(ARG002)
96-96: Unused method argument: data_sys
(ARG002)
deepmd/pt/modifier/base_modifier.py
66-66: Unused method argument: data_sys
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (8)
deepmd/utils/data.py (3)
142-147: LGTM! Safe initialization of modifier caching.The caching mechanism is initialized correctly with appropriate guards. The
_modified_frame_cacheis only created when a modifier exists, and all cache accesses inget_single_frameproperly checkself.modifier is not Nonebefore accessing the cache.
387-394: LGTM! Efficient cache lookup with proper guards.The early return from cache is well-guarded with all necessary conditions and provides a good optimization for repeated frame access.
498-514: LGTM! Well-implemented preload method with progress logging.The preload method correctly handles early returns, efficiently skips already-cached frames, and provides useful progress feedback. The implementation aligns well with the broader data loading pipeline shown in the related files.
deepmd/pt/modifier/base_modifier.py (1)
1-65: LGTM! Well-structured base class with proper abstractions.The class correctly inherits from
torch.nn.Moduleand uses@abstractmethodto enforce subclass implementation offorward(). The serialization methods follow standard patterns.source/tests/pt/test_data_modifier.py (4)
29-36: LGTM! Plugin registrations correctly implemented.Both modifier plugins are properly registered with the
modifier_args_pluginregistry and return empty argument lists, which is appropriate for these test modifiers.
39-73: LGTM! Test modifier correctly implements custom logic.The
ModifierRandomTesterclass correctly:
- Sets
modifier_type = "random_tester"matching its registration name (line 47)- Overrides
modify_datawith custom randomization logic (lines 59-73)- The
forwardmethod returns minimal data sincemodify_datais completely overridden and never calls the base implementationThe unused parameters flagged by static analysis (
atype,box,fparam,aparaminforward, anddata_sysinmodify_data) are required by the base class interface but not used in this simple test implementation. This is acceptable for test code.
76-110: LGTM! Test modifier correctly implements zeroing behavior.The
ModifierZeroTesterclass correctly setsmodifier_type = "zero_tester"matching its registration and implements the expected zeroing logic inmodify_data. LikeModifierRandomTester, it completely overrides the basemodify_dataimplementation, so the minimalforwardmethod is acceptable.
113-187: LGTM! Comprehensive test coverage for modifier integration.The test cases effectively verify:
test_init_modify_data- confirms thatzero_testercorrectly zeros out training and validation data (lines 130-147)test_full_modify_data- validates thatrandom_testerproduces consistent results before and aftertrainer.run(), which tests the caching mechanism (lines 149-178)The tests cover the critical integration points between modifiers, data loading, and the training pipeline. The cleanup in
tearDownis thorough.
- Add data modifier support in model inference pipeline - Enable saving and loading data modifiers with frozen models - Add ModifierScalingTester for scaling model predictions as data modification - Update test cases to verify data modifier functionality in inference - Enhance modifier argument registration with documentation This allows data modifiers to be applied during model inference and preserves them when saving frozen models for consistent behavior across training and inference stages.
…to save the data modification before training or to perform modification on-the-fly.
…dification during training.
f402bab to
d4919c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/pt/train/wrapper.py (1)
191-194: Consider validating modifier prediction keys.The code assumes all keys in
modifier_predexist inmodel_pred. If a modifier returns unexpected keys, this will raise aKeyErrorat line 194. While this may be acceptable for controlled scenarios, consider adding a defensive check if robustness is preferred.🔎 Optional defensive check
if self.modifier is not None: modifier_pred = self.modifier(**input_dict) for k, v in modifier_pred.items(): - model_pred[k] = model_pred[k] + v + if k in model_pred: + model_pred[k] = model_pred[k] + vsource/tests/pt/test_data_modifier.py (1)
96-97: Note:__new__methods appear redundant but may be required.All three test modifier classes define
__new__methods that simply callsuper().__new__(cls). While this seems redundant, it may be required by the plugin registration system. If not needed, these methods can be removed to simplify the code.Also applies to: 142-143, 184-185
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/train/wrapper.pysource/tests/pt/test_data_modifier.py
🧰 Additional context used
🧬 Code graph analysis (1)
source/tests/pt/test_data_modifier.py (5)
deepmd/pt/infer/deep_eval.py (2)
DeepEval(87-863)eval(321-395)deepmd/pt/entrypoints/main.py (2)
main(543-602)freeze(382-408)deepmd/pt/modifier/base_modifier.py (3)
BaseModifier(30-187)forward(78-87)modify_data(90-187)deepmd/utils/data.py (2)
DeepmdData(35-1079)add(150-203)deepmd/pt/train/wrapper.py (1)
forward(155-205)
🪛 Ruff (0.14.10)
source/tests/pt/test_data_modifier.py
96-96: Unused static method argument: args
(ARG004)
96-96: Unused static method argument: kwargs
(ARG004)
113-113: Unused method argument: coord
(ARG002)
114-114: Unused method argument: atype
(ARG002)
115-115: Unused method argument: box
(ARG002)
116-116: Unused method argument: fparam
(ARG002)
117-117: Unused method argument: aparam
(ARG002)
118-118: Unused method argument: do_atomic_virial
(ARG002)
123-123: Unused method argument: data_sys
(ARG002)
142-142: Unused static method argument: args
(ARG004)
142-142: Unused static method argument: kwargs
(ARG004)
155-155: Unused method argument: coord
(ARG002)
156-156: Unused method argument: atype
(ARG002)
157-157: Unused method argument: box
(ARG002)
158-158: Unused method argument: fparam
(ARG002)
159-159: Unused method argument: aparam
(ARG002)
160-160: Unused method argument: do_atomic_virial
(ARG002)
165-165: Unused method argument: data_sys
(ARG002)
184-184: Unused static method argument: args
(ARG004)
184-184: Unused static method argument: kwargs
(ARG004)
🔇 Additional comments (2)
source/tests/pt/test_data_modifier.py (2)
314-384: LGTM - Complex but correct test logic.This test verifies that a frozen model with an embedded modifier correctly applies the modifier during inference. The test:
- Trains a base model and freezes it (used as the modifier source)
- Trains a new model with a
scaling_testermodifier referencing the base model- Verifies that inference applies the modifier correctly:
output = base_output + scaled_modifier_outputThe test logic is sound and properly validates the inference path in
wrapper.py.
111-121: Note: Unused parameters are expected for interface compliance.The static analysis warnings about unused parameters in
forward()methods are false positives. These parameters are required to match theBaseModifierabstract interface. Therandom_testerandzero_testermodifiers only modify training data viamodify_data()and don't perform inference-time modifications, hence theirforward()methods return empty dictionaries.Also applies to: 153-163
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @deepmd/pt/infer/deep_eval.py:
- Around line 175-189: The .pt loading path creates a local variable modifier
but never sets the instance attribute, causing self.modifier to be undefined
later; change the .pt branch so it assigns the loaded modifier to self.modifier
(not just a local modifier), ensure self.modifier is initialized to None before
the conditional if needed, and pass self.modifier into ModelWrapper
(ModelWrapper(model, modifier=self.modifier)) so both the instance attribute and
wrapper receive the same value when using torch.jit.load with extra_files.
🧹 Nitpick comments (1)
deepmd/pt/utils/dataset.py (1)
74-75: Add docstring for the new public method.The
preload_and_modify_all_data_torchmethod is a new public API but lacks documentation explaining its purpose, when to call it, and its behavior.🔎 Proposed docstring
def preload_and_modify_all_data_torch(self) -> None: + """Preload and apply modifier to all frames in the dataset. + + This method should be called before training to apply any data + modifications and optionally cache the results for improved performance. + Uses worker threads to avoid CUDA re-initialization issues. + """ self._data_system.preload_and_modify_all_data_torch(max(1, NUM_WORKERS))
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
deepmd/pd/utils/dataset.pydeepmd/pt/entrypoints/main.pydeepmd/pt/infer/deep_eval.pydeepmd/pt/infer/inference.pydeepmd/pt/modifier/__init__.pydeepmd/pt/modifier/base_modifier.pydeepmd/pt/train/training.pydeepmd/pt/train/wrapper.pydeepmd/pt/utils/dataloader.pydeepmd/pt/utils/dataset.pydeepmd/utils/data.pysource/tests/pt/test_data_modifier.py
🚧 Files skipped from review as they are similar to previous changes (2)
- deepmd/pd/utils/dataset.py
- deepmd/pt/infer/inference.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-16T21:50:10.680Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4226
File: deepmd/dpmodel/model/make_model.py:370-373
Timestamp: 2024-10-16T21:50:10.680Z
Learning: In `deepmd/dpmodel/model/make_model.py`, the variable `nall` assigned but not used is intentional and should not be flagged in future reviews.
Applied to files:
deepmd/pt/modifier/base_modifier.py
🧬 Code graph analysis (8)
deepmd/pt/utils/dataset.py (4)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(30-187)deepmd/utils/data.py (3)
DataRequirementItem(1082-1162)DeepmdData(35-1079)preload_and_modify_all_data_torch(517-532)deepmd/pd/utils/dataset.py (1)
DeepmdDataSetForLoader(17-56)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data_torch(241-243)
deepmd/pt/train/training.py (2)
deepmd/pd/train/training.py (2)
get_additional_data_requirement(1222-1246)single_model_stat(209-239)deepmd/pt/utils/dataset.py (2)
add_data_requirement(58-72)preload_and_modify_all_data_torch(74-75)
deepmd/pt/modifier/base_modifier.py (2)
deepmd/dpmodel/modifier/base_modifier.py (1)
make_base_modifier(17-77)deepmd/utils/data.py (1)
DeepmdData(35-1079)
deepmd/pt/utils/dataloader.py (3)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(30-187)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data_torch(74-75)deepmd/utils/data.py (1)
preload_and_modify_all_data_torch(517-532)
deepmd/pt/entrypoints/main.py (2)
deepmd/pt/modifier/__init__.py (1)
get_data_modifier(17-23)deepmd/pt/infer/inference.py (1)
Tester(28-76)
deepmd/pt/infer/deep_eval.py (1)
deepmd/pt/train/wrapper.py (1)
ModelWrapper(16-217)
source/tests/pt/test_data_modifier.py (4)
deepmd/pt/entrypoints/main.py (3)
main(543-602)freeze(382-408)get_trainer(100-216)deepmd/pt/modifier/base_modifier.py (3)
BaseModifier(30-187)forward(78-87)modify_data(90-187)deepmd/utils/data.py (1)
DeepmdData(35-1079)deepmd/pt/train/training.py (1)
get_data(1247-1292)
deepmd/pt/modifier/__init__.py (2)
deepmd/pt/modifier/base_modifier.py (1)
BaseModifier(30-187)deepmd/utils/plugin.py (1)
get_class_by_type(144-154)
🪛 Ruff (0.14.10)
deepmd/pt/modifier/base_modifier.py
90-90: Unused method argument: data_sys
(ARG002)
163-166: Avoid specifying long messages outside the exception class
(TRY003)
172-175: Avoid specifying long messages outside the exception class
(TRY003)
181-184: Avoid specifying long messages outside the exception class
(TRY003)
source/tests/pt/test_data_modifier.py
96-96: Unused static method argument: args
(ARG004)
96-96: Unused static method argument: kwargs
(ARG004)
113-113: Unused method argument: coord
(ARG002)
114-114: Unused method argument: atype
(ARG002)
115-115: Unused method argument: box
(ARG002)
116-116: Unused method argument: fparam
(ARG002)
117-117: Unused method argument: aparam
(ARG002)
118-118: Unused method argument: do_atomic_virial
(ARG002)
123-123: Unused method argument: data_sys
(ARG002)
142-142: Unused static method argument: args
(ARG004)
142-142: Unused static method argument: kwargs
(ARG004)
155-155: Unused method argument: coord
(ARG002)
156-156: Unused method argument: atype
(ARG002)
157-157: Unused method argument: box
(ARG002)
158-158: Unused method argument: fparam
(ARG002)
159-159: Unused method argument: aparam
(ARG002)
160-160: Unused method argument: do_atomic_virial
(ARG002)
165-165: Unused method argument: data_sys
(ARG002)
184-184: Unused static method argument: args
(ARG004)
184-184: Unused static method argument: kwargs
(ARG004)
deepmd/pt/modifier/__init__.py
22-22: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (15)
deepmd/utils/data.py (3)
143-148: LGTM: Modifier cache initialization.The logic correctly initializes the cache only when a modifier is present and respects the modifier's
use_cachepreference. This provides flexibility for users to control memory usage.
256-287: LGTM: Consistent method signature updates.Both
get_item_torchandget_item_paddlenow accept thenum_workerparameter and delegate toget_single_frame. The default value maintains backward compatibility.
517-533: LGTM: Preload method is well-structured.The method efficiently handles the preloading workflow with appropriate guards (early return when caching is disabled) and helpful progress logging for large datasets.
deepmd/pt/modifier/__init__.py (1)
17-23: LGTM: Factory function is correctly implemented.The function properly:
- Creates a defensive copy to prevent parameter mutation
- Validates the required "type" field
- Delegates to the plugin registry for instantiation
The static analysis hint (TRY003) about the error message is a minor style preference and doesn't affect functionality.
deepmd/pt/utils/dataloader.py (2)
28-30: LGTM: Clean modifier integration.The modifier parameter is properly threaded through the data loader initialization with appropriate type hints and backward-compatible defaults.
Also applies to: 89-89, 101-101
241-243: LGTM: Preload delegation is straightforward.The method correctly delegates preloading to each system in the dataset. The naming is consistent with the underlying API.
deepmd/pt/entrypoints/main.py (3)
118-123: LGTM: Modifier creation is correctly guarded.The code properly checks for modifier configuration, creates the modifier via the factory function, and ensures it's on the correct device.
158-158: LGTM: Modifier propagation to data loaders.The modifier is consistently passed to both training and validation data loaders, ensuring uniform data modification during training.
Also applies to: 168-168
387-407: LGTM: Improved modifier serialization using in-memory buffer.The freeze function now correctly uses
io.BytesIOfor modifier serialization instead of temporary files, addressing the previous review concern about resource cleanup. This is a cleaner and safer approach.deepmd/pt/infer/deep_eval.py (1)
2-2: LGTM: Modifier deserialization logic (aside from initialization issue).The modifier loading implementation correctly:
- Uses
io.BytesIOfor in-memory deserialization- Checks for modifier presence in extra_files
- Passes the modifier to
ModelWrapperOnce the
self.modifierinitialization issue is addressed, this implementation will be complete.Note: This approval is contingent on fixing the
self.modifierinitialization issue flagged in the separate comment.Also applies to: 175-188
deepmd/pt/train/wrapper.py (1)
191-194: LGTM! Modifier application logic is correct.The modifier predictions are correctly added to the model predictions during inference. This aligns with the training-time behavior where modifier outputs are subtracted from labels, ensuring consistent treatment of modifier corrections across training and inference.
deepmd/pt/train/training.py (2)
335-344: Correct ordering of data preparation steps.The data requirements are properly constructed, attached to datasets, and then all data is preloaded and modified before statistics computation. This ensures that model statistics (mean/std for normalization) are computed on the modified data, which is the correct behavior.
379-390: Correct multi-task data preparation flow.Each model's data requirements are properly established and data is preloaded/modified before statistics computation, maintaining consistency with the single-task path.
deepmd/pt/modifier/base_modifier.py (1)
161-187: Good error handling for missing modifier outputs.The method properly validates that the modifier provides all required outputs (energy, force, virial) when their corresponding flags are set, with clear error messages identifying the missing keys.
source/tests/pt/test_data_modifier.py (1)
314-384: Well-structured inference test.The test properly validates the modifier's behavior during inference by training a model with a modifier and verifying that the final predictions correctly include the scaled modifier contributions. The test setup is comprehensive and the assertions are appropriate.
d4919c3 to
352c149
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @deepmd/utils/data.py:
- Around line 502-512: The submitted future from ThreadPoolExecutor for
modifier.modify_data is never awaited so exceptions are swallowed; after
submitting (future = executor.submit(self.modifier.modify_data, frame_data,
self)), call future.result() to re-raise any exceptions and ensure the modifier
completed before continuing, then proceed with the existing caching logic that
writes to self._modified_frame_cache when self.use_modifier_cache is True.
- Around line 400-406: The cached frame is currently returned directly from the
modified frame cache (check involving self.use_modifier_cache, self.modifier and
self._modified_frame_cache[index]), allowing callers to mutate the cached numpy
arrays; change the return so you return a deep copy of the cached frame (e.g.,
use a deep copy utility to duplicate nested dicts/arrays) before returning to
preserve cache immutability and prevent in-place modifications from corrupting
the cache.
🧹 Nitpick comments (1)
source/tests/pt/test_data_modifier.py (1)
96-97: Remove unnecessary__new__method.The
__new__method simply callssuper().__new__(cls), which is Python's default behavior. This method can be removed entirely.🔎 Proposed fix
@BaseModifier.register("random_tester") class ModifierRandomTester(BaseModifier): - def __new__(cls, *args, **kwargs): - return super().__new__(cls) - def __init__(
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/pt/train/wrapper.pydeepmd/utils/data.pysource/tests/pt/test_data_modifier.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/train/wrapper.py
🧰 Additional context used
🧬 Code graph analysis (1)
deepmd/utils/data.py (3)
deepmd/pt/modifier/base_modifier.py (1)
modify_data(90-187)deepmd/pt/utils/dataset.py (1)
preload_and_modify_all_data_torch(74-75)deepmd/pt/utils/dataloader.py (1)
preload_and_modify_all_data_torch(241-243)
🪛 Ruff (0.14.10)
source/tests/pt/test_data_modifier.py
96-96: Unused static method argument: args
(ARG004)
96-96: Unused static method argument: kwargs
(ARG004)
113-113: Unused method argument: coord
(ARG002)
114-114: Unused method argument: atype
(ARG002)
115-115: Unused method argument: box
(ARG002)
116-116: Unused method argument: fparam
(ARG002)
117-117: Unused method argument: aparam
(ARG002)
118-118: Unused method argument: do_atomic_virial
(ARG002)
123-123: Unused method argument: data_sys
(ARG002)
142-142: Unused static method argument: args
(ARG004)
142-142: Unused static method argument: kwargs
(ARG004)
155-155: Unused method argument: coord
(ARG002)
156-156: Unused method argument: atype
(ARG002)
157-157: Unused method argument: box
(ARG002)
158-158: Unused method argument: fparam
(ARG002)
159-159: Unused method argument: aparam
(ARG002)
160-160: Unused method argument: do_atomic_virial
(ARG002)
165-165: Unused method argument: data_sys
(ARG002)
184-184: Unused static method argument: args
(ARG004)
184-184: Unused static method argument: kwargs
(ARG004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Analyze (python)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (5)
deepmd/utils/data.py (2)
256-270: LGTM! Clean refactoring to centralize frame loading.Both
get_item_torchandget_item_paddlenow properly delegate toget_single_framewith the newnum_workerparameter, which centralizes the frame loading and modification logic.Also applies to: 272-287
515-530: LGTM! Preload method correctly caches all frames.The preload method correctly iterates through all frames, applies modifications, and caches results. Progress logging every 100 frames is reasonable for typical dataset sizes.
source/tests/pt/test_data_modifier.py (3)
64-91: LGTM! Plugin registrations are properly documented.The three modifier plugin registration functions (
modifier_random_tester,modifier_zero_tester,modifier_scaling_tester) are well-structured with clear documentation and appropriate argument specifications.
253-384: Excellent test coverage for modifier functionality.The three test methods provide comprehensive coverage:
test_init_modify_datavalidates that the zero modifier correctly zeros out training and validation data.test_full_modify_dataensures modification is applied consistently and only once.test_inferenceperforms end-to-end validation with model training, freezing, and scaled predictions.The parameterization across batch sizes and cache settings strengthens the test suite.
386-401: LGTM! Robust cleanup with proper error handling.The tearDown method correctly uses try-except blocks to ensure all cleanup attempts are made, even if individual file removals fail. This prevents test artifacts from accumulating.
Overview
This PR adds a data modifier plugin functionality to the PyTorch implementation of DeepMD. This feature allows for on-the-fly data modification during training and inference, enabling advanced data manipulation capabilities.
Key Changes
1. Added Data Modifier to Training Pipeline
deepmd/pt/entrypoints/main.pyget_data_modifier)get_trainer()function2. Added Data Modifier to Inference
deepmd/pt/infer/deep_eval.pyDeepEvalclass3. Implemented Data Modifier Framework
deepmd/pt/modifier/__init__.py(entirely new)BaseModifierwith registration systemModifierRandomTester: Applies random scaling to energy/force/virial data for testingModifierZeroTester: Zeroes out energy/force/virial data for testingModifierScalingTester: Applies scaled model predictions as data modifications4. Added Data Modifier Tests
deepmd/pt/test/test_modifier.py(entirely new)Summary by CodeRabbit
New Features
Behavior
Tests
✏️ Tip: You can customize this high-level summary in your review settings.