Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] _Replace With Suitable Title_ #316

Open
wangyifei1992 opened this issue Jan 23, 2025 · 0 comments
Open

[BUG] _Replace With Suitable Title_ #316

wangyifei1992 opened this issue Jan 23, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@wangyifei1992
Copy link

Describe the bug

I create a predicting task to finetune 84M unimol2 model with or without checkpoint published. The dataset contains 100M samples from Molecule3D dataset with HOMO label. Howerver the train loss cannot converge correctly. When without checkpoint, the train loss gradually decreasing to 0.1 and suddenly increasing to 0.55 and cannot decrease to 0.1 any more. And the train loss with checkpoint can also jump to 0.55. I've tried a variety of training parametes, but got similar loss curves. Here is two examples using the finetune parameters suggested in README. The task_name is molecule3d_homo and I've allready written it's mean and std in unimol_finetune task_metainfo.

With checkpoint

Image

Without checkpoint

Image

I wonder what can I do to keep the finetune trainning steady? Thanks a lot.

Uni-Mol Version

Uni-Mol2

Expected behavior

The train loss can decrease and keep steady in unimol finetune task.

To Reproduce

No response

Environment

V100+python 3.9+pytorch 2.0.0

Additional Context

No response

@wangyifei1992 wangyifei1992 added the bug Something isn't working label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant