pd: support dpa1 #4414

HydrogenSulfate · 2024-11-25T05:46:34Z

Summary of this PR:

upload DPA-1 related code
merge much develop code
add all eager composite operators except softmax_grad, p_norm_grad, split_grad, and concat_grad to the composite operator blacklist(https://github.com/deepmodeling/deepmd-kit/pull/4414/files#diff-e678abb052b278f8a479f8d13b839a9ec0effd9923478a850bc13758f918e1e9R134-R148) to significantly improve model execution speed (reducing the time taken from 100% more than PyTorch to about 10% to 15% more).

related PR: lanpa/tensorboardX#728

Training curve:

Accuracy test(left: paddle, right: torch):

Ralated optimization of Paddle framework:

Summary by CodeRabbit

Release Notes

New Features
- Introduced several new classes for molecular descriptors, including DescrptDPA1, DescrptBlockSeAtten, and LayerNorm, enhancing the modeling capabilities for molecular simulations.
- Added new JSON configuration files for model parameters and multitask models related to water simulations.
- Implemented new test classes for validating the functionality of the DPAtomicModel and various descriptor classes.
- Added new test classes for evaluating denoising models, including TestDenoiseModelDPA1 and TestDenoiseModelDPA2.
- Enhanced the ModelWrapper class to clarify the handling of model parameters and state management.
Bug Fixes
- Improved internal logic for handling model state saving and loading, ensuring consistency in outputs.
Documentation
- Enhanced type hints and return annotations across various classes and methods for better clarity.
Tests
- Expanded the testing framework with new test cases for denoising models and descriptor functionalities, ensuring robust validation of features.
- Activated previously skipped tests for energy models, improving test coverage.
- Enhanced multitask training tests with new configuration handling and test classes.

for more information, see https://pre-commit.ci

…y to coverage newly added code

for more information, see https://pre-commit.ci

njzjz

Please add consistent tests. (please note my changes in #4438)

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

source/tests/consistent/model/test_dpa1.py (1)

248-255: Consider using ravel() for consistency

The Paddle backend uses flatten() while other backends use ravel(). While both methods produce similar results, using ravel() would maintain consistency across all backends.

-                ret["energy"].flatten(),
-                ret["atom_energy"].flatten(),
-                ret["force"].flatten(),
-                ret["virial"].flatten(),
-                ret["atom_virial"].flatten(),
+                ret["energy"].ravel(),
+                ret["atom_energy"].ravel(),
+                ret["force"].ravel(),
+                ret["virial"].ravel(),
+                ret["atom_virial"].ravel(),

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8b5b4a8 and d98c644.

📒 Files selected for processing (2)

source/tests/consistent/descriptor/test_dpa1.py (5 hunks)
source/tests/consistent/model/test_dpa1.py (7 hunks)

🔇 Additional comments (11)

source/tests/consistent/model/test_dpa1.py (6)

17-17: LGTM: Import follows consistent pattern

The addition of INSTALLED_PD import follows the established pattern for backend availability flags.

41-45: LGTM: Paddle imports follow established pattern

The conditional import block for Paddle models follows the same structure as other backends, with proper fallback handling.

99-99: LGTM: Class attribute follows convention

The addition of pd_class follows the established pattern for backend class assignments.

112-113: LGTM: Reference backend selection properly extended

The addition of Paddle backend to reference selection follows the established priority order and error handling pattern.

131-132: LGTM: Model initialization properly handled

The Paddle model initialization in pass_data_to_cls follows the established pattern.

204-211: LGTM: Evaluation method follows convention

The eval_pd method correctly implements the standard evaluation interface used by other backends.

source/tests/consistent/descriptor/test_dpa1.py (5)

21-21: LGTM: Import changes follow established patterns

The import changes correctly follow the same pattern used for other backends, with proper conditional handling when Paddle is not installed.

Also applies to: 43-46

195-221: LGTM: Skip condition matches other backends

The skip_pd property correctly implements the same skip logic used by other backends, properly handling both common skip conditions and specific attention layer tests.

323-323: LGTM: Class assignment follows convention

The pd_class assignment follows the established pattern used for other backend implementations.

424-432: LGTM: Evaluation method matches other backends

The eval_pd method correctly implements the evaluation logic consistent with other backend implementations, properly passing all required parameters to the descriptor evaluation method.

43-46: Verify test coverage for Paddle backend

While the implementation looks correct, let's verify that all test cases will be properly executed for the Paddle backend.

Also applies to: 195-221, 323-323, 424-432

✅ Verification successful

Let me gather more information to verify the test coverage specifically for the Paddle backend.

Let me try one more verification to understand the test execution flow and Paddle backend handling.

Test coverage for Paddle backend is properly implemented

The verification shows that the Paddle backend test coverage is comprehensive and well-implemented:

The test class TestDPA1 includes proper handling of the Paddle backend through:
- Conditional import of DescrptDPA1PD
- A dedicated skip_pd property that controls test execution
- Implementation of eval_pd method that properly evaluates Paddle descriptors
The test infrastructure in common.py includes complete Paddle-specific utilities:
- Proper device handling with PD_DEVICE
- Paddle-specific neighbor list building
- Tensor conversion and evaluation methods
The test parameterization covers all relevant scenarios, and Paddle tests are skipped only when appropriate (e.g., when specific features are incompatible)

The implementation follows the same pattern as other backends (TF, PyTorch, JAX) ensuring consistent test coverage across all supported frameworks.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check test coverage for Paddle backend implementation

# Check if there are any test cases that might be inadvertently skipped
echo "Checking for potential gaps in test coverage..."

# Look for test methods that might not handle pd backend
rg -l "def test_.*" source/tests/consistent/descriptor/
rg "(?<!skip_)pd" source/tests/consistent/descriptor/

# Check for any TODO or FIXME comments related to pd implementation
rg -i "todo.*pd|fixme.*pd" source/tests/consistent/descriptor/

# Look for any inconsistencies in backend handling
rg "(tf|pt|jax|pd)_(class|obj)" source/tests/consistent/descriptor/

Length of output: 7067

Script:

#!/bin/bash
# Check test files for Paddle backend implementation and test methods
rg -A 5 "class.*Test" source/tests/consistent/descriptor/test_dpa1.py
rg "def test_" source/tests/consistent/descriptor/test_dpa1.py

# Check if there are any skip decorators or conditional test executions
rg "@.*skip" source/tests/consistent/descriptor/test_dpa1.py
rg "skipif|pytest.mark.skip" source/tests/consistent/descriptor/test_dpa1.py

# Check the test base class to understand test execution flow
rg -A 10 "class.*Base" source/tests/consistent/descriptor/common.py

Length of output: 514

Script:

#!/bin/bash
# Check the base test class implementation and test execution flow
cat source/tests/consistent/descriptor/common.py

# Check the complete test file for DPA1
cat source/tests/consistent/descriptor/test_dpa1.py

# Look for any conditional test execution patterns
rg -l "INSTALLED_PD" source/tests/consistent/descriptor/

Length of output: 20080

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

source/tests/consistent/model/test_dpa1.py (2)
112-113: Document the backend priority order

While the PD backend is correctly added to the reference backend selection, the priority order of backends (PT > TF > PD > JAX > DP) is not documented. Consider adding a comment explaining why this specific order was chosen.

248-255: Consider using ravel() for consistency

While the return value extraction is functionally correct, it uses flatten() while other backends use ravel(). Although both methods achieve similar results, consider using ravel() for consistency with other backend implementations.

Apply this diff to maintain consistency:
        elif backend is self.RefBackend.PD:
            return (
-                ret["energy"].flatten(),
-                ret["atom_energy"].flatten(),
-                ret["force"].flatten(),
-                ret["virial"].flatten(),
-                ret["atom_virial"].flatten(),
+                ret["energy"].ravel(),
+                ret["atom_energy"].ravel(),
+                ret["force"].ravel(),
+                ret["virial"].ravel(),
+                ret["atom_virial"].ravel(),
            )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d98c644 and e74d272.

📒 Files selected for processing (2)

source/tests/consistent/descriptor/test_dpa1.py (5 hunks)
source/tests/consistent/model/test_dpa1.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

source/tests/consistent/descriptor/test_dpa1.py

🔇 Additional comments (4)

source/tests/consistent/model/test_dpa1.py (4)

17-17: LGTM: Backend initialization follows established patterns

The PD backend initialization follows the same pattern as other backends, with proper conditional imports and fallback handling.

Also applies to: 41-45

99-99: LGTM: Class attribute follows convention

The pd_class attribute is correctly added, maintaining consistency with other backend class attributes.

131-132: LGTM: Model instantiation is consistent

The PD model instantiation follows the established pattern using the appropriate get_model_pd function.

204-211: LGTM: Evaluation method is properly implemented

The eval_pd method correctly follows the established pattern for backend evaluation methods, passing all required parameters.

HydrogenSulfate · 2024-11-30T07:23:41Z

Please add consistent tests. (please note my changes in #4438)

Thanks for comments, and updated 2 consistency UT of dpa1,
=========== 346 passed, 446 skipped, 1 warning in 48.38s =============

HydrogenSulfate · 2024-12-04T03:47:27Z

@iProzd PR of pd: DPA-1 and pd: DPA-2 is ready for review.

deepmd/pd/train/wrapper.py

source/tests/pd/test_multitask.py

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)

source/tests/pd/model/water/multitask_sharefit.json (2)

11-15: Document the rationale behind selection parameters

The selection parameters [46, 92, 4] seem arbitrary. Consider adding a comment explaining why these specific values were chosen and their significance in the context of water molecule simulations.

56-61: Consider adjusting the minimum learning rate

The stop_lr value of 3.51e-08 is extremely small. Consider using a larger value (e.g., 1e-6) as training might not benefit from such small learning rates and could unnecessarily extend training time.

source/tests/pd/test_multitask.py (1)

71-76: Document the conditions for parameter sharing

The conditions for parameter sharing in fitting networks exclude certain parameters (bias_atom_e and case_embd). Consider adding a comment explaining why these specific parameters are excluded from sharing.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 95ac41d and 63ccdbd.

📒 Files selected for processing (2)

source/tests/pd/model/water/multitask_sharefit.json (1 hunks)
source/tests/pd/test_multitask.py (6 hunks)

🧰 Additional context used

🪛 GitHub Check: CodeQL

source/tests/pd/test_multitask.py

[notice] 48-48: Unused global variable
The global variable 'multitask_sharefit_template' is not used.

🔇 Additional comments (2)

source/tests/pd/model/water/multitask_sharefit.json (1)

64-82: LGTM! Well-structured loss weighting strategy

The loss configuration implements a good strategy by:

Initially prioritizing force accuracy (pref_f: 1000 → 1)
Gradually increasing energy contribution (pref_e: 0.02 → 1)
Maintaining identical configurations for both models

source/tests/pd/test_multitask.py (1)

43-48: ⚠️ Potential issue

Remove or utilize the unused template variable

The multitask_sharefit_template variable is loaded but never used in the code. Either:

Remove it if it's not needed, or
Update the tests to utilize this shared fitting configuration

🧰 Tools

🪛 GitHub Check: CodeQL

[notice] 48-48: Unused global variable
The global variable 'multitask_sharefit_template' is not used.

source/tests/pd/model/water/multitask_sharefit.json

source/tests/pd/test_multitask.py

njzjz · 2024-12-13T20:37:02Z

@coderabbitai resolve

This reverts commit e8167ce.

Support DPA-2 in paddle backend. This PR will be updated after #4414 is merged. ### Training curve: ![training_curves_comparison_dpa2](https://github.com/user-attachments/assets/29bdeffa-cf2d-4586-afcf-7df0569997c3) ### Accuracy test(left: paddle, right: torch): ![image](https://github.com/user-attachments/assets/5bff55f3-1c39-4b95-93f0-68783e794716) Ralated optimization of Paddle framework: - [x] PaddlePaddle/Paddle#69349 - [x] PaddlePaddle/Paddle#69333 - [x] PaddlePaddle/Paddle#69479 - [x] PaddlePaddle/Paddle#69515 - [x] PaddlePaddle/Paddle#69487 - [x] PaddlePaddle/Paddle#69661 - [x] PaddlePaddle/Paddle#69660 - [x] PaddlePaddle/Paddle#69596 - [x] PaddlePaddle/Paddle#69556  ## Summary by CodeRabbit - **New Features** - Introduced new classes for molecular descriptors: `DescrptDPA2`, `DescrptBlockRepformers`, `DescrptSeTTebd`, and `DescrptBlockSeTTebd`. - Added new functions for tensor operations and descriptor management, enhancing the capabilities of the module. - Updated JSON configurations for multitask models to refine selection criteria and data paths. - **Bug Fixes** - Improved error handling and parameter validation across various descriptor classes. - **Documentation** - Enhanced test coverage for new descriptor functionalities and configurations. - **Tests** - Added new test classes to validate the functionality of `DescrptDPA2` and multitask training scenarios. - Expanded test capabilities for descriptor classes based on installed dependencies. - Updated existing tests to support new configurations and functionalities.  --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

HydrogenSulfate and others added 30 commits November 2, 2024 11:14

add core modules of paddle backend and water/se_e2_a example

48f77f3

add paddle code in consistent test

2082a59

clean env and training

2ae45b8

add more test files

7f03a04

Merge branch 'devel' into add_paddle_backend_core_and_water_se_e2_a

4d1c44c

fix pt->pd

72c9b4e

update test_python.yml

3b1c348

restore .pre-commit-config.yaml

a46dcb5

remove redundant file

90f9ff9

Skip bfloat16 for some cases

0a6baa6

enable prim by default in unitest

4b77e55

[pre-commit.ci] auto fixes from pre-commit.com hooks

6e139a2

for more information, see https://pre-commit.ci

fix env code

9437957

Merge branch 'devel' into add_paddle_backend_core_and_water_se_e2_a

f1d762f

Merge branch 'devel' into add_paddle_backend_core_and_water_se_e2_a

8534597

update test_ener.py

c22b45d

add missing pd_class

39842ff

use paddle Tensor instead of numpy array in pd/test_auto_batch_size.p…

07cd98e

…y to coverage newly added code

add training test and remove ase_calc.py

bb2d547

add training test and remove ase_calc.py

5fb6d8e

Merge branch 'devel' into add_paddle_backend_core_and_water_se_e2_a

91066f8

upload missing json

90c9c03

restore pt/test_auto_batch_size.py

eb7384e

rerun CI for network problem

9faf54f

add multitask unitest

4e3a121

add more unitest

18333ab

Merge branch 'devel' into add_paddle_backend_core_and_water_se_e2_a

f9c6da8

remove redundant file and fix typo

3fd979d

[pre-commit.ci] auto fixes from pre-commit.com hooks

5922e84

for more information, see https://pre-commit.ci

update unitest

f5a17a9

njzjz reviewed Nov 30, 2024

View reviewed changes

HydrogenSulfate force-pushed the add_dpa1 branch from 99aa33b to d98c644 Compare November 30, 2024 07:15

update consistent check of dpa1

e74d272

HydrogenSulfate force-pushed the add_dpa1 branch from d98c644 to e74d272 Compare November 30, 2024 07:18

coderabbitai bot reviewed Nov 30, 2024

View reviewed changes

njzjz requested a review from iProzd November 30, 2024 20:31

iProzd reviewed Dec 5, 2024

View reviewed changes

deepmd/pd/train/wrapper.py Outdated Show resolved Hide resolved

HydrogenSulfate added 4 commits December 6, 2024 19:43

restore share_params code and update it

d1a5a65

Merge branch 'devel' into add_dpa1

95ac41d

update test_multitask.py

22fffd8

Merge branch 'devel' into add_dpa1

63ccdbd

HydrogenSulfate force-pushed the add_dpa1 branch from edea0aa to 63ccdbd Compare December 9, 2024 14:06

github-advanced-security bot found potential problems Dec 9, 2024

View reviewed changes

source/tests/pd/test_multitask.py Fixed Show resolved Hide resolved

coderabbitai bot reviewed Dec 9, 2024

View reviewed changes

source/tests/pd/model/water/multitask_sharefit.json Show resolved Hide resolved

source/tests/pd/test_multitask.py Show resolved Hide resolved

njzjz requested a review from iProzd December 13, 2024 20:38

HydrogenSulfate added 2 commits December 14, 2024 12:14

remove useless variable 'multitask_sharefit_template_json'

273d281

Merge branch 'devel' into add_dpa1

aaf28d5

iProzd approved these changes Dec 16, 2024

View reviewed changes

njzjz approved these changes Dec 17, 2024

View reviewed changes

njzjz enabled auto-merge December 17, 2024 22:27

njzjz added this pull request to the merge queue Dec 17, 2024

Merged via the queue into deepmodeling:devel with commit e8167ce Dec 18, 2024
60 checks passed

HydrogenSulfate added a commit to HydrogenSulfate/deepmd-kit that referenced this pull request Dec 18, 2024

Revert "pd: support dpa1 (deepmodeling#4414)"

2a7421d

This reverts commit e8167ce.

HydrogenSulfate deleted the add_dpa1 branch December 19, 2024 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd: support dpa1 #4414

pd: support dpa1 #4414

HydrogenSulfate commented Nov 25, 2024 •

edited by coderabbitai bot

Loading

njzjz left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

HydrogenSulfate commented Nov 30, 2024 •

edited

Loading

HydrogenSulfate commented Dec 4, 2024

coderabbitai bot left a comment

njzjz commented Dec 13, 2024

pd: support dpa1 #4414

pd: support dpa1 #4414

Conversation

HydrogenSulfate commented Nov 25, 2024 • edited by coderabbitai bot Loading

Training curve:

Accuracy test(left: paddle, right: torch):

Summary by CodeRabbit

Release Notes

njzjz left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

HydrogenSulfate commented Nov 30, 2024 • edited Loading

HydrogenSulfate commented Dec 4, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

njzjz commented Dec 13, 2024

HydrogenSulfate commented Nov 25, 2024 •

edited by coderabbitai bot

Loading

HydrogenSulfate commented Nov 30, 2024 •

edited

Loading