Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用gpu训练train loss为nan #65

Open
WWJ0720 opened this issue Mar 23, 2023 · 18 comments
Open

用gpu训练train loss为nan #65

WWJ0720 opened this issue Mar 23, 2023 · 18 comments

Comments

@WWJ0720
Copy link

WWJ0720 commented Mar 23, 2023

大家有遇到过用gpu训练train loss为nan的情况吗,尝试了很多办法无法解决

@XwX123321
Copy link

你好,请问你跑通了吗,我在跑voc数据集上遇到了一些困难,能不能和你取得联系

@WWJ0720
Copy link
Author

WWJ0720 commented Mar 24, 2023

你好,请问你跑通了吗,我在跑voc数据集上遇到了一些困难,能不能和你取得联系

我还没跑通,qq:1920698385

@andwizard
Copy link

怎么说兄弟,你弄出来了吗

@LittleShuo
Copy link

你好,请问您解决了吗,我找了很多办法,用GPU还是nan,希望可以得到您的帮助,谢谢

@SingCheng
Copy link

你好,请问您解决了吗,我找了很多办法,用GPU还是nan,希望可以得到您的帮助,谢谢

@LittleShuo 你好,我也遇到一樣的問題, 請問你解决了嗎?

@SingCheng
Copy link

你們都是用rtx30的顯卡嗎?

@andwizard
Copy link

你們都是用rtx30的顯卡嗎?

我GTX 1080兄弟

@LittleShuo
Copy link

@SingCheng 没有解决

@SingCheng
Copy link

如果你用的是RTX30/40顯卡,你們可以用下面的解决

  1. conda create -n myenv python=3.8
  2. pip install nvidia-pyindex
  3. pip install nvidia-tensorflow
  4. conda install tensorboard

我個人理解是tensorflow官方不會更新1.x的版本了,你只要用nvidia的版本(包括了CPU跟GPU)就不會有問題了.

@LittleShuo
Copy link

@SingCheng 我用的30的卡,如果只使用cpu不用gpu的话就正常,使用gpu就为nan,这种方案也试过了,还是同样问题。

@LittleShuo
Copy link

@SingCheng 嗯,好的我再试试,谢谢

@SingCheng
Copy link

@SingCheng 嗯,好的我再试试,谢谢

你是在什麼平台下跑的?Linux?

@LittleShuo
Copy link

@SingCheng Linux

@SingCheng
Copy link

@SingCheng Linux

python 版本是?

@SingCheng
Copy link

@SingCheng Linux

Ubuntu 20.04 or later (64-bit)
GPU support requires a CUDA®-enabled card
For NVIDIA GPUs, the r455 driver must be installed
For wheel installation:

Python 3.8
pip 20.3 or later

這個是package的要求,你看看有沒有符合要求

@LittleShuo
Copy link

@SingCheng 为了适配Tensor1.x所以用了python3.7

@SingCheng
Copy link

@SingCheng 为了适配Tensor1.x所以用了python3.7

你把python升級到3.8就可以了

@LittleShuo
Copy link

好的,我试试,非常感谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants