[BUG] _Replace With Suitable Title_ #316

wangyifei1992 · 2025-01-23T10:58:47Z

Describe the bug

I create a predicting task to finetune 84M unimol2 model with or without checkpoint published. The dataset contains 100M samples from Molecule3D dataset with HOMO label. Howerver the train loss cannot converge correctly. When without checkpoint, the train loss gradually decreasing to 0.1 and suddenly increasing to 0.55 and cannot decrease to 0.1 any more. And the train loss with checkpoint can also jump to 0.55. I've tried a variety of training parametes, but got similar loss curves. Here is two examples using the finetune parameters suggested in README. The task_name is molecule3d_homo and I've allready written it's mean and std in unimol_finetune task_metainfo.

With checkpoint

Without checkpoint

I wonder what can I do to keep the finetune trainning steady? Thanks a lot.

Uni-Mol Version

Uni-Mol2

Expected behavior

The train loss can decrease and keep steady in unimol finetune task.

To Reproduce

No response

Environment

V100+python 3.9+pytorch 2.0.0

Additional Context

No response

wangyifei1992 added the bug Something isn't working label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] _Replace With Suitable Title_ #316

[BUG] _Replace With Suitable Title_ #316

wangyifei1992 commented Jan 23, 2025

[BUG] _Replace With Suitable Title_ #316

[BUG] _Replace With Suitable Title_ #316

Comments

wangyifei1992 commented Jan 23, 2025

Describe the bug

Uni-Mol Version

Expected behavior

To Reproduce

Environment

Additional Context