用gpu训练train loss为nan #65

WWJ0720 · 2023-03-23T03:05:20Z

大家有遇到过用gpu训练train loss为nan的情况吗，尝试了很多办法无法解决

XwX123321 · 2023-03-24T08:27:16Z

你好，请问你跑通了吗，我在跑voc数据集上遇到了一些困难，能不能和你取得联系

WWJ0720 · 2023-03-24T08:39:31Z

你好，请问你跑通了吗，我在跑voc数据集上遇到了一些困难，能不能和你取得联系

我还没跑通，qq：1920698385

andwizard · 2023-04-14T03:27:30Z

怎么说兄弟，你弄出来了吗

LittleShuo · 2023-04-15T08:00:53Z

你好，请问您解决了吗，我找了很多办法，用GPU还是nan，希望可以得到您的帮助，谢谢

SingCheng · 2023-05-15T08:28:11Z

你好，请问您解决了吗，我找了很多办法，用GPU还是nan，希望可以得到您的帮助，谢谢

@LittleShuo 你好,我也遇到一樣的問題, 請問你解决了嗎?

SingCheng · 2023-06-12T04:15:51Z

你們都是用rtx30的顯卡嗎?

andwizard · 2023-06-12T13:14:21Z

你們都是用rtx30的顯卡嗎?

我GTX 1080兄弟

LittleShuo · 2023-06-12T14:24:05Z

@SingCheng 没有解决

SingCheng · 2023-06-12T14:37:37Z

如果你用的是RTX30/40顯卡,你們可以用下面的解决

conda create -n myenv python=3.8
pip install nvidia-pyindex
pip install nvidia-tensorflow
conda install tensorboard

我個人理解是tensorflow官方不會更新1.x的版本了,你只要用nvidia的版本(包括了CPU跟GPU)就不會有問題了.

LittleShuo · 2023-06-12T14:43:47Z

@SingCheng 我用的30的卡，如果只使用cpu不用gpu的话就正常，使用gpu就为nan,这种方案也试过了，还是同样问题。

LittleShuo · 2023-06-12T14:48:51Z

@SingCheng 嗯，好的我再试试，谢谢

SingCheng · 2023-06-12T14:49:49Z

@SingCheng 嗯，好的我再试试，谢谢

你是在什麼平台下跑的?Linux?

LittleShuo · 2023-06-12T14:50:47Z

@SingCheng Linux

SingCheng · 2023-06-12T14:51:47Z

@SingCheng Linux

python 版本是?

SingCheng · 2023-06-12T14:54:08Z

@SingCheng Linux

Ubuntu 20.04 or later (64-bit)
GPU support requires a CUDA®-enabled card
For NVIDIA GPUs, the r455 driver must be installed
For wheel installation:

Python 3.8
pip 20.3 or later

這個是package的要求,你看看有沒有符合要求

LittleShuo · 2023-06-12T14:54:56Z

@SingCheng 为了适配Tensor1.x所以用了python3.7

SingCheng · 2023-06-12T14:55:48Z

@SingCheng 为了适配Tensor1.x所以用了python3.7

你把python升級到3.8就可以了

LittleShuo · 2023-06-12T14:57:07Z

好的，我试试，非常感谢

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用gpu训练train loss为nan #65

用gpu训练train loss为nan #65

WWJ0720 commented Mar 23, 2023

XwX123321 commented Mar 24, 2023

WWJ0720 commented Mar 24, 2023

andwizard commented Apr 14, 2023

LittleShuo commented Apr 15, 2023

SingCheng commented May 15, 2023

SingCheng commented Jun 12, 2023

andwizard commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

用gpu训练train loss为nan #65

用gpu训练train loss为nan #65

Comments

WWJ0720 commented Mar 23, 2023

XwX123321 commented Mar 24, 2023

WWJ0720 commented Mar 24, 2023

andwizard commented Apr 14, 2023

LittleShuo commented Apr 15, 2023

SingCheng commented May 15, 2023

SingCheng commented Jun 12, 2023

andwizard commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023

SingCheng commented Jun 12, 2023

LittleShuo commented Jun 12, 2023