You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a repository contains a large git diff and logs tensorboard images ClearML logging will hang and never recover
To reproduce
Clone this repo https://github.com/Inquisitive-ME/clearml_hang_example and run clearml_example.py
You will see the console logging working, while sometimes the first epoch of scalars will show up but after that no Scalars, Plots, or Debug Samples will be logged
Expected behaviour
I expect ClearML to correctly log and not get stuck
Hi @Inquisitive-ME! Are you able to see an auxiliary_git_diff artifact reported to the task? This artifacts stores the git diff, so the upload might take a while depending on your network. If you don't see it, it is likely that the file is still uploading and some reports are waiting for it to finish, and that's why it appears to be hanging.
I recommend adding the large files to .gitignore or setting sdk.development.store_uncommited_code_diff: false in clearml.conf, if you don't need the git diff.
I do see the file and it appears to be uploaded. However, there are still no debug samples. I have had experiments running for days. It NEVER works. This is a bug not an issue of needing to wait longer.
sdk.development.store_uncommited_code_diff: false Does fix the problem but you have a bug in your software that is very annoying to deal with
Describe the bug
If a repository contains a large git diff and logs tensorboard images ClearML logging will hang and never recover
To reproduce
Clone this repo https://github.com/Inquisitive-ME/clearml_hang_example and run clearml_example.py
You will see the console logging working, while sometimes the first epoch of scalars will show up but after that no Scalars, Plots, or Debug Samples will be logged
Expected behaviour
I expect ClearML to correctly log and not get stuck
Environment
Related Discussion
If this continues a slack thread, please provide a link to the original slack thread.
The text was updated successfully, but these errors were encountered: