Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

got some problem when uploading files #141

Open
Kenny-Ch opened this issue Aug 29, 2024 · 5 comments
Open

got some problem when uploading files #141

Kenny-Ch opened this issue Aug 29, 2024 · 5 comments

Comments

@Kenny-Ch
Copy link

when i do training, i found that wandb suddenly can't upload wandb-metadata.json. After training , I try to upload the file with wandb sync and I got these error.
image

wandb sync wandb/run-20240826_190835-g7b6iqjc/
Find logs at: /home/JIng/kenny/Project/personal_copilot/training/wandb/debug-cli.JIng.log
Syncing: http://localhost:8080/charly/personal-code-copilot/runs/g7b6iqjc ... wandb: ERROR Error uploading "code/train.py": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-metadata.json": CommError, <Response [507]>
wandb: ERROR Error uploading "wandb-summary.json": CommError, <Response [507]>
wandb: ERROR Error uploading "conda-environment.yaml": CommError, <Response [507]>
wandb: ERROR Error uploading "output.log": CommError, <Response [507]>
wandb: ERROR Error uploading "requirements.txt": CommError, <Response [507]>
wandb: ERROR Error uploading "config.yaml": CommError, <Response [507]>

and I also got the error when I running wandb verify

Default host selected: http://localhost:8080
Find detailed logs for this test at: /tmp/tmp5033o82e/wandb
Checking if logged in...................................................✅
Checking signed URL upload..............................................Traceback (most recent call last):
  File "/home/JIng/miniconda3/envs/starcode-3b/bin/wandb", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/cli/cli.py", line 2960, in verify
    url_success, url = wandb_verify.check_graphql_put(api, host)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/JIng/miniconda3/envs/starcode-3b/lib/python3.11/site-packages/wandb/sdk/verify/verify.py", line 400, in check_graphql_put
    contents = read_file.read()
               ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'read'

here is some error log I found in /var/log

./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:34:12.313204066Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058451284Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 24:garbage_collect_runs_v2 paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:00.058625177Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"task 33:FlatRunsMigrator paused due to repeated failures"}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"task 33:FlatRunsMigrator paused due to repeated failures"}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:35:12.317314097Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:36:12.317093934Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:37:12.316296925Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla-glue.log:{"level":"ERROR","time":"2024-08-29T05:38:12.315714385Z","info":{"program":"megabinary","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":486,"errors":[{"type":"*errors.errorString","error":"no known task \"PUBLISHCUSTOMMETRICS\""}]},"data":{"dd.service":"glue","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b"},"message":"no known task \"PUBLISHCUSTOMMETRICS\""}
./gorilla.log:{"level":"ERROR","time":"2024-08-29T05:32:51.071134593Z","info":{"program":"gorilla","source":"github.com/wandb/core/services/gorilla/pkg/observability/gerr/reporting.go:193","pid":59},"data":{"dd.service":"gorilla","dd.version":"d0d66fce2fc6aeaa7c70fa9ee8a244032098aa7b","http":{"url":"http://192.168.104.9/oidc/auth","method":"GET","headers":{"Host":"192.168.104.9","Connection":"close","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0","Accept-Encoding":"gzip, deflate","Accept-Language":"zh,en-US;q=0.9,en;q=0.8","X-Original-Uri":"/system-admin/static/css/main.c9951160.css.map","X-Forwarded-For":"192.168.104.9"}}},"message":"Not logged in","dd.trace_id":"10464612527120353434","error":{"kind":"*errors.errorString","message":"Not logged in"}}
./mysql.log:2024-08-29T05:33:11.670654Z 27 [Note] Aborted connection 27 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670658Z 22 [Note] Aborted connection 22 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670773Z 25 [Note] Aborted connection 25 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670709Z 23 [Note] Aborted connection 23 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670743Z 21 [Note] Aborted connection 21 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670767Z 24 [Note] Aborted connection 24 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670656Z 28 [Note] Aborted connection 28 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670788Z 17 [Note] Aborted connection 17 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670797Z 26 [Note] Aborted connection 26 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670889Z 20 [Note] Aborted connection 20 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670895Z 19 [Note] Aborted connection 19 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.670958Z 18 [Note] Aborted connection 18 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:11.768660Z 7 [Note] Aborted connection 7 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194361Z 15 [Note] Aborted connection 15 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194462Z 8 [Note] Aborted connection 8 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194516Z 11 [Note] Aborted connection 11 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194523Z 9 [Note] Aborted connection 9 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
./mysql.log:2024-08-29T05:33:12.194478Z 13 [Note] Aborted connection 13 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)

and here is the debug bundle:
debug.zip

This comment was marked as duplicate.

@paulosabile-wb
Copy link

Hi @Kenny-Ch Good day and thank you for reaching out to us. Happy to help you on this!

Let me assist you to troubleshoot this. Do you know if you are having storage issues in wandb? Can you check on your Teams settings if you are reaching the storage limit?

Also, may I know your current SDK version? You can get this by running wandb --version. Thank you!

@Kenny-Ch
Copy link
Author

Hi @Kenny-Ch Good day and thank you for reaching out to us. Happy to help you on this!

Let me assist you to troubleshoot this. Do you know if you are having storage issues in wandb? Can you check on your Teams settings if you are reaching the storage limit?

Also, may I know your current SDK version? You can get this by running wandb --version. Thank you!

hi @paulosabile-wb glad to hear from you.

I have check my local disk has enough space to upload the wandb-metadata.json,however i have no idea where to find the storage limit in my page, here is the team page in my self-host server:
image

and my wandb version is: 0.17.6
image

@paulosabile-wb
Copy link

Thank you for confirming this @Kenny-Ch. Could you please try to use the latest version 0.17.8 and let us know if the errors are still the same?

When was the last time you were able to run an experiment? Do you know what changed before you encountered this error?

If error still persist on the latest version, could you please share the debug-internal.log and debug.log for the affected run. These files are under your local folder wandb/run-_-/logs in the same directory where you’re running your code. These files will help us with more details about this error.

Thank you!

Copy link

WandB Internal User commented:
paulosabile-wb commented:
Hi @Kenny-Ch Good day and thank you for reaching out to us. Happy to help you on this!

Let me assist you to troubleshoot this. Do you know if you are having storage issues in wandb? Can you check on your Teams settings if you are reaching the storage limit?

Also, may I know your current SDK version? You can get this by running wandb --version. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants