Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd: support dpa1 #4414

Merged
merged 74 commits into from
Dec 18, 2024
Merged

pd: support dpa1 #4414

merged 74 commits into from
Dec 18, 2024

Conversation

HydrogenSulfate
Copy link
Contributor

@HydrogenSulfate HydrogenSulfate commented Nov 25, 2024

Summary of this PR:

  1. upload DPA-1 related code
  2. merge much develop code
  3. add all eager composite operators except softmax_grad, p_norm_grad, split_grad, and concat_grad to the composite operator blacklist(https://github.com/deepmodeling/deepmd-kit/pull/4414/files#diff-e678abb052b278f8a479f8d13b839a9ec0effd9923478a850bc13758f918e1e9R134-R148) to significantly improve model execution speed (reducing the time taken from 100% more than PyTorch to about 10% to 15% more).

related PR: lanpa/tensorboardX#728

Training curve:

training_curves_comparison_eager_opt

Accuracy test(left: paddle, right: torch):

image

Ralated optimization of Paddle framework:

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced several new classes for molecular descriptors, including DescrptDPA1, DescrptBlockSeAtten, and LayerNorm, enhancing the modeling capabilities for molecular simulations.
    • Added new JSON configuration files for model parameters and multitask models related to water simulations.
    • Implemented new test classes for validating the functionality of the DPAtomicModel and various descriptor classes.
    • Added new test classes for evaluating denoising models, including TestDenoiseModelDPA1 and TestDenoiseModelDPA2.
    • Enhanced the ModelWrapper class to clarify the handling of model parameters and state management.
  • Bug Fixes

    • Improved internal logic for handling model state saving and loading, ensuring consistency in outputs.
  • Documentation

    • Enhanced type hints and return annotations across various classes and methods for better clarity.
  • Tests

    • Expanded the testing framework with new test cases for denoising models and descriptor functionalities, ensuring robust validation of features.
    • Activated previously skipped tests for energy models, improving test coverage.
    • Enhanced multitask training tests with new configuration handling and test classes.

HydrogenSulfate and others added 30 commits November 2, 2024 11:14
Copy link
Member

@njzjz njzjz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add consistent tests. (please note my changes in #4438)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
source/tests/consistent/model/test_dpa1.py (1)

248-255: Consider using ravel() for consistency

The Paddle backend uses flatten() while other backends use ravel(). While both methods produce similar results, using ravel() would maintain consistency across all backends.

-                ret["energy"].flatten(),
-                ret["atom_energy"].flatten(),
-                ret["force"].flatten(),
-                ret["virial"].flatten(),
-                ret["atom_virial"].flatten(),
+                ret["energy"].ravel(),
+                ret["atom_energy"].ravel(),
+                ret["force"].ravel(),
+                ret["virial"].ravel(),
+                ret["atom_virial"].ravel(),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8b5b4a8 and d98c644.

📒 Files selected for processing (2)
  • source/tests/consistent/descriptor/test_dpa1.py (5 hunks)
  • source/tests/consistent/model/test_dpa1.py (7 hunks)
🔇 Additional comments (11)
source/tests/consistent/model/test_dpa1.py (6)

17-17: LGTM: Import follows consistent pattern

The addition of INSTALLED_PD import follows the established pattern for backend availability flags.


41-45: LGTM: Paddle imports follow established pattern

The conditional import block for Paddle models follows the same structure as other backends, with proper fallback handling.


99-99: LGTM: Class attribute follows convention

The addition of pd_class follows the established pattern for backend class assignments.


112-113: LGTM: Reference backend selection properly extended

The addition of Paddle backend to reference selection follows the established priority order and error handling pattern.


131-132: LGTM: Model initialization properly handled

The Paddle model initialization in pass_data_to_cls follows the established pattern.


204-211: LGTM: Evaluation method follows convention

The eval_pd method correctly implements the standard evaluation interface used by other backends.

source/tests/consistent/descriptor/test_dpa1.py (5)

21-21: LGTM: Import changes follow established patterns

The import changes correctly follow the same pattern used for other backends, with proper conditional handling when Paddle is not installed.

Also applies to: 43-46


195-221: LGTM: Skip condition matches other backends

The skip_pd property correctly implements the same skip logic used by other backends, properly handling both common skip conditions and specific attention layer tests.


323-323: LGTM: Class assignment follows convention

The pd_class assignment follows the established pattern used for other backend implementations.


424-432: LGTM: Evaluation method matches other backends

The eval_pd method correctly implements the evaluation logic consistent with other backend implementations, properly passing all required parameters to the descriptor evaluation method.


43-46: Verify test coverage for Paddle backend

While the implementation looks correct, let's verify that all test cases will be properly executed for the Paddle backend.

Also applies to: 195-221, 323-323, 424-432

✅ Verification successful

Let me gather more information to verify the test coverage specifically for the Paddle backend.


Let me try one more verification to understand the test execution flow and Paddle backend handling.


Test coverage for Paddle backend is properly implemented

The verification shows that the Paddle backend test coverage is comprehensive and well-implemented:

  1. The test class TestDPA1 includes proper handling of the Paddle backend through:
    • Conditional import of DescrptDPA1PD
    • A dedicated skip_pd property that controls test execution
    • Implementation of eval_pd method that properly evaluates Paddle descriptors
  2. The test infrastructure in common.py includes complete Paddle-specific utilities:
    • Proper device handling with PD_DEVICE
    • Paddle-specific neighbor list building
    • Tensor conversion and evaluation methods
  3. The test parameterization covers all relevant scenarios, and Paddle tests are skipped only when appropriate (e.g., when specific features are incompatible)

The implementation follows the same pattern as other backends (TF, PyTorch, JAX) ensuring consistent test coverage across all supported frameworks.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check test coverage for Paddle backend implementation

# Check if there are any test cases that might be inadvertently skipped
echo "Checking for potential gaps in test coverage..."

# Look for test methods that might not handle pd backend
rg -l "def test_.*" source/tests/consistent/descriptor/
rg "(?<!skip_)pd" source/tests/consistent/descriptor/

# Check for any TODO or FIXME comments related to pd implementation
rg -i "todo.*pd|fixme.*pd" source/tests/consistent/descriptor/

# Look for any inconsistencies in backend handling
rg "(tf|pt|jax|pd)_(class|obj)" source/tests/consistent/descriptor/

Length of output: 7067


Script:

#!/bin/bash
# Check test files for Paddle backend implementation and test methods
rg -A 5 "class.*Test" source/tests/consistent/descriptor/test_dpa1.py
rg "def test_" source/tests/consistent/descriptor/test_dpa1.py

# Check if there are any skip decorators or conditional test executions
rg "@.*skip" source/tests/consistent/descriptor/test_dpa1.py
rg "skipif|pytest.mark.skip" source/tests/consistent/descriptor/test_dpa1.py

# Check the test base class to understand test execution flow
rg -A 10 "class.*Base" source/tests/consistent/descriptor/common.py

Length of output: 514


Script:

#!/bin/bash
# Check the base test class implementation and test execution flow
cat source/tests/consistent/descriptor/common.py

# Check the complete test file for DPA1
cat source/tests/consistent/descriptor/test_dpa1.py

# Look for any conditional test execution patterns
rg -l "INSTALLED_PD" source/tests/consistent/descriptor/

Length of output: 20080

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
source/tests/consistent/model/test_dpa1.py (2)

112-113: Document the backend priority order

While the PD backend is correctly added to the reference backend selection, the priority order of backends (PT > TF > PD > JAX > DP) is not documented. Consider adding a comment explaining why this specific order was chosen.


248-255: Consider using ravel() for consistency

While the return value extraction is functionally correct, it uses flatten() while other backends use ravel(). Although both methods achieve similar results, consider using ravel() for consistency with other backend implementations.

Apply this diff to maintain consistency:

        elif backend is self.RefBackend.PD:
            return (
-                ret["energy"].flatten(),
-                ret["atom_energy"].flatten(),
-                ret["force"].flatten(),
-                ret["virial"].flatten(),
-                ret["atom_virial"].flatten(),
+                ret["energy"].ravel(),
+                ret["atom_energy"].ravel(),
+                ret["force"].ravel(),
+                ret["virial"].ravel(),
+                ret["atom_virial"].ravel(),
            )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d98c644 and e74d272.

📒 Files selected for processing (2)
  • source/tests/consistent/descriptor/test_dpa1.py (5 hunks)
  • source/tests/consistent/model/test_dpa1.py (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/consistent/descriptor/test_dpa1.py
🔇 Additional comments (4)
source/tests/consistent/model/test_dpa1.py (4)

17-17: LGTM: Backend initialization follows established patterns

The PD backend initialization follows the same pattern as other backends, with proper conditional imports and fallback handling.

Also applies to: 41-45


99-99: LGTM: Class attribute follows convention

The pd_class attribute is correctly added, maintaining consistency with other backend class attributes.


131-132: LGTM: Model instantiation is consistent

The PD model instantiation follows the established pattern using the appropriate get_model_pd function.


204-211: LGTM: Evaluation method is properly implemented

The eval_pd method correctly follows the established pattern for backend evaluation methods, passing all required parameters.

@HydrogenSulfate
Copy link
Contributor Author

HydrogenSulfate commented Nov 30, 2024

Please add consistent tests. (please note my changes in #4438)

Thanks for comments, and updated 2 consistency UT of dpa1,
=========== 346 passed, 446 skipped, 1 warning in 48.38s =============

@njzjz njzjz requested a review from iProzd November 30, 2024 20:31
@HydrogenSulfate
Copy link
Contributor Author

@iProzd PR of pd: DPA-1 and pd: DPA-2 is ready for review.

deepmd/pd/train/wrapper.py Outdated Show resolved Hide resolved
source/tests/pd/test_multitask.py Fixed Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
source/tests/pd/model/water/multitask_sharefit.json (2)

11-15: Document the rationale behind selection parameters

The selection parameters [46, 92, 4] seem arbitrary. Consider adding a comment explaining why these specific values were chosen and their significance in the context of water molecule simulations.


56-61: Consider adjusting the minimum learning rate

The stop_lr value of 3.51e-08 is extremely small. Consider using a larger value (e.g., 1e-6) as training might not benefit from such small learning rates and could unnecessarily extend training time.

source/tests/pd/test_multitask.py (1)

71-76: Document the conditions for parameter sharing

The conditions for parameter sharing in fitting networks exclude certain parameters (bias_atom_e and case_embd). Consider adding a comment explaining why these specific parameters are excluded from sharing.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 95ac41d and 63ccdbd.

📒 Files selected for processing (2)
  • source/tests/pd/model/water/multitask_sharefit.json (1 hunks)
  • source/tests/pd/test_multitask.py (6 hunks)
🧰 Additional context used
🪛 GitHub Check: CodeQL
source/tests/pd/test_multitask.py

[notice] 48-48: Unused global variable
The global variable 'multitask_sharefit_template' is not used.

🔇 Additional comments (2)
source/tests/pd/model/water/multitask_sharefit.json (1)

64-82: LGTM! Well-structured loss weighting strategy

The loss configuration implements a good strategy by:

  1. Initially prioritizing force accuracy (pref_f: 1000 → 1)
  2. Gradually increasing energy contribution (pref_e: 0.02 → 1)
  3. Maintaining identical configurations for both models
source/tests/pd/test_multitask.py (1)

43-48: ⚠️ Potential issue

Remove or utilize the unused template variable

The multitask_sharefit_template variable is loaded but never used in the code. Either:

  1. Remove it if it's not needed, or
  2. Update the tests to utilize this shared fitting configuration
🧰 Tools
🪛 GitHub Check: CodeQL

[notice] 48-48: Unused global variable
The global variable 'multitask_sharefit_template' is not used.

source/tests/pd/test_multitask.py Show resolved Hide resolved
@njzjz
Copy link
Member

njzjz commented Dec 13, 2024

@coderabbitai resolve

@njzjz njzjz requested a review from iProzd December 13, 2024 20:38
@njzjz njzjz enabled auto-merge December 17, 2024 22:27
@njzjz njzjz added this pull request to the merge queue Dec 17, 2024
Merged via the queue into deepmodeling:devel with commit e8167ce Dec 18, 2024
60 checks passed
HydrogenSulfate added a commit to HydrogenSulfate/deepmd-kit that referenced this pull request Dec 18, 2024
@HydrogenSulfate HydrogenSulfate deleted the add_dpa1 branch December 19, 2024 07:56
github-merge-queue bot pushed a commit that referenced this pull request Dec 25, 2024
Support DPA-2 in paddle backend. This PR will be updated after #4414 is
merged.

### Training curve:


![training_curves_comparison_dpa2](https://github.com/user-attachments/assets/29bdeffa-cf2d-4586-afcf-7df0569997c3)



### Accuracy test(left: paddle, right: torch):


![image](https://github.com/user-attachments/assets/5bff55f3-1c39-4b95-93f0-68783e794716)


Ralated optimization of Paddle framework:
- [x] PaddlePaddle/Paddle#69349
- [x] PaddlePaddle/Paddle#69333
- [x] PaddlePaddle/Paddle#69479
- [x] PaddlePaddle/Paddle#69515
- [x] PaddlePaddle/Paddle#69487
- [x] PaddlePaddle/Paddle#69661
- [x] PaddlePaddle/Paddle#69660
- [x] PaddlePaddle/Paddle#69596
- [x] PaddlePaddle/Paddle#69556

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Introduced new classes for molecular descriptors: `DescrptDPA2`,
`DescrptBlockRepformers`, `DescrptSeTTebd`, and `DescrptBlockSeTTebd`.
- Added new functions for tensor operations and descriptor management,
enhancing the capabilities of the module.
- Updated JSON configurations for multitask models to refine selection
criteria and data paths.

- **Bug Fixes**
- Improved error handling and parameter validation across various
descriptor classes.

- **Documentation**
- Enhanced test coverage for new descriptor functionalities and
configurations.

- **Tests**
- Added new test classes to validate the functionality of `DescrptDPA2`
and multitask training scenarios.
- Expanded test capabilities for descriptor classes based on installed
dependencies.
- Updated existing tests to support new configurations and
functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants