Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

配置Lane detection 的数据集culane的demo遇到的问题 #150

Open
CHANdaFeng opened this issue Jun 15, 2023 · 23 comments
Open

配置Lane detection 的数据集culane的demo遇到的问题 #150

CHANdaFeng opened this issue Jun 15, 2023 · 23 comments

Comments

@CHANdaFeng
Copy link

请问以下我在配置Lane detection 的数据集culane的demo中,当训练启动python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet34_culane_aug1b.py --checkpoint=resnet34_bezierlanenet_culane_aug1b_20211109.pt
遇到问题
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
报错

我配的环境安装官网配的pytorch1.6 nvidia-smi显示cuda是12.1, nvcc -V,显示没有,请问环境方面有什么问题吗
下面是环境的
(pad) cxf@cxf:~$ conda list

packages in environment at /home/cxf/anaconda3/envs/pad:

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 1.4.0 pypi_0 pypi
addict 2.4.0 pypi_0 pypi
blas 1.0 mkl
ca-certificates 2023.05.30 h06a4308_0
cachetools 4.2.4 pypi_0 pypi
certifi 2021.5.30 py36h06a4308_0
charset-normalizer 2.0.12 pypi_0 pypi
cudatoolkit 10.2.89 hfd86e86_1
dataclasses 0.8 pypi_0 pypi
dill 0.3.4 pypi_0 pypi
filetype 1.0.8 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
future 0.18.3 pypi_0 pypi
google-auth 2.20.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
grpcio 1.48.2 pypi_0 pypi
idna 3.4 pypi_0 pypi
imageio 2.10.1 pypi_0 pypi
importlib-metadata 4.8.3 pypi_0 pypi
importmagician 0.1.0 pypi_0 pypi
intel-openmp 2022.1.0 h9e868ea_3769
joblib 1.1.1 pypi_0 pypi
jpeg 9e h5eee18b_1
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libdeflate 1.17 h5eee18b_0
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.5.0 h6a678d5_2
libwebp-base 1.2.4 h5eee18b_1
lz4-c 1.9.4 h6a678d5_0
markdown 3.3.7 pypi_0 pypi
mkl 2020.2 256
mkl-service 2.3.0 py36he8ac12f_0
mkl_fft 1.3.0 py36h54f3939_0
mkl_random 1.1.1 py36h0573a6f_0
mmcv-full 1.3.5 pypi_0 pypi
multiprocess 0.70.12.2 pypi_0 pypi
ncurses 6.4 h6a678d5_0
ninja 1.11.1 pypi_0 pypi
ninja-base 1.10.2 hd09550d_5
numpy 1.19.2 py36h54aff64_0
numpy-base 1.19.2 py36hfa32c7d_0
oauthlib 3.2.2 pypi_0 pypi
olefile 0.46 py36_0
opencv-python 4.5.4.58 pypi_0 pypi
openjpeg 2.4.0 h3ad879b_0
openssl 1.1.1t h7f8727e_0
p-tqdm 1.3.3 pypi_0 pypi
pathos 0.2.8 pypi_0 pypi
pillow 8.4.0 pypi_0 pypi
pip 21.2.2 py36h06a4308_0
pox 0.3.0 pypi_0 pypi
ppft 1.6.6.4 pypi_0 pypi
protobuf 3.19.6 pypi_0 pypi
pyasn1 0.5.0 pypi_0 pypi
pyasn1-modules 0.3.0 pypi_0 pypi
python 3.6.13 h12debd9_1
pytorch 1.6.0 py3.6_cuda10.2.89_cudnn7.6.5_0 pytorch
pyyaml 6.0 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.27.1 pypi_0 pypi
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scikit-learn 0.23.2 pypi_0 pypi
scipy 1.5.4 pypi_0 pypi
setuptools 58.0.4 py36h06a4308_0
shapely 1.8.0 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_1
sqlite 3.41.2 h5eee18b_0
tensorboard 2.7.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
thop 0.0.31-2005241907 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
timm 0.4.5 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
torchvision 0.7.0 py36_cu102 pytorch
tqdm 4.62.3 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
ujson 4.2.0 pypi_0 pypi
urllib3 1.26.16 pypi_0 pypi
werkzeug 2.0.3 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
xz 5.4.2 h5eee18b_0
yapf 0.32.0 pypi_0 pypi
zipp 3.6.0 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0

@voldemortX
Copy link
Owner

@CHANdaFeng 你这个报错是因为使用的pytorch只支持sm75(20系显卡),你可以看看30系最低需要哪个版本。这个和你cuda版本没关系。

@CHANdaFeng
Copy link
Author

@voldemortX 好的,非常感谢!我在查查原因!

@CHANdaFeng
Copy link
Author

hello,请问一下我现在的环境是 cuda 12.1,torch 1.10.1 ,mmcv-full 1.4.6
在运行python main_landet.py --train --config=configs/lane_detection/baseline/enet_culane.py
时候报错:
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
是什么原因呀? 查看确实没有libcudart.so.11.0:, 不应该是调用12.1吗

@voldemortX
Copy link
Owner

@CHANdaFeng 你安装的mmcv是对应cuda 12.1的版本吗,mmcv不是有个表单对应各个版本

@CHANdaFeng
Copy link
Author

@voldemortX 好的,我在查查原因 谢谢!

@CHANdaFeng
Copy link
Author

@voldemortX hello~ 请问一下 我运行 python main_landet.py --train --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py
报错超时RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
请问一下是什么原因呀?
现在环境是

cuda11.3.0
torch==1.11.0
l mmcv==2.0.0
numpy=1.19

具体报错如下:
/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/cuda/init.py:82: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:112.)
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.3'
/home/cxf/pytorch-auto-drive/utils/models/lane_detection/laneatt.py:22: UserWarning: Can't complie line nms op for LaneATT. Set verbose=True for load in /utils/csrc/apis.py L9 for details.
main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8.
Loaded torchvision ImageNet pre-trained weights V1.
Not using distributed mode
cuda
Traceback (most recent call last):
File "main_landet.py", line 75, in
runner = Runner(cfg=cfg)
File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 17, in init
super().init(cfg)
File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 117, in init
net_without_ddp, self.device = self.get_device_and_move_model()
File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 159, in get_device_and_move_model
self.model.to(device)
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 907, in to
return self._apply(convert)
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 578, in _apply
module._apply(fn)
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 601, in _apply
param_applied = fn(param)
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/nn/modules/module.py", line 905, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/cuda/init.py", line 216, in _lazy_init
torch._C._cuda_init()
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.

@voldemortX
Copy link
Owner

@CHANdaFeng 看不出来,首先你的nms编译失败很可能就是程序没识别到cuda。应该是你的环境有问题,你可以先试一些简单的脚本能不能跑,逐步定位问题。

@CHANdaFeng
Copy link
Author

@voldemortX hello,现在nms编译成功了,当在启动 python main_landet.py --train --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py
时候报错,如下,请问是什么原因呀,感谢您的耐心解答!
Successfully complied line nms for LaneATT.
main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8.
Loaded torchvision ImageNet pre-trained weights V1.
Not using distributed mode
cuda
Build from dict error in function or class: CULaneAsBezier
In Python: <class 'utils.datasets.lane_as_bezier.CULaneAsBezier'>
Traceback (most recent call last):
File "main_landet.py", line 75, in
runner = Runner(cfg=cfg)
File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 17, in init
super().init(cfg)
File "/home/cxf/pytorch-auto-drive/utils/runners/base.py", line 127, in init
dataset = DATASETS.from_dict(cfg['dataset'],
File "/home/cxf/pytorch-auto-drive/utils/registry.py", line 41, in from_dict
raise e
File "/home/cxf/pytorch-auto-drive/utils/registry.py", line 38, in from_dict
return function_or_class(**dict_params_)
File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 39, in init
self._init_all()
File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 104, in _init_all
self.beziers = self.loader_bezier()
File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 72, in loader_bezier
with open(self.bezier_labels, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/bezier_labels/train_3.json'

@CHANdaFeng
Copy link
Author

我已经按照CULane Dataset中下载对应的数据集并修改,下载的时候没有看到bezier_labels/train_3.json'文件?

@CHANdaFeng
Copy link
Author

@voldemortX 我已经解决这个问题~

@CHANdaFeng
Copy link
Author

hello @voldemortX ,我训练CULane数据集 resnet18_culane_aug1b.py 得到pt模型后, 想看一下test预测的效果, 在运行python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --mixed-precision
时候遇到 找不到这个图片的错误, 但是数据集对应的路径实际是有这个文件的,请问是什么原因呀
(pad) cxf@cxf:~/pytorch-auto-drive$ python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --mixed-precision
Successfully complied line nms for LaneATT.
main_landet.py:24: UserWarning: Unable to set a high enough file descriptor limit 8192 (your system may has a low hard limit 4096). If you encounter related problems in training, try reduce the number of workers by --workers, or switch into file_system mode at Line 8.
Loaded torchvision ImageNet pre-trained weights V1.
cuda:0
0%| | 0/34680 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main_landet.py", line 76, in
runner.run()
File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_tester.py", line 31, in run
self.test_one_set(self.model, self.device, self.dataloader, self._cfg['mixed_precision'],
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/cxf/pytorch-auto-drive/utils/runners/lane_det_tester.py", line 47, in test_one_set
for images, filenames in tqdm(loader):
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/tqdm/std.py", line 1180, in iter
for obj in iterable:
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next
data = self._next_data()
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/cxf/pytorch-auto-drive/utils/datasets/lane_as_bezier.py", line 48, in getitem
img = Image.open(self.images[index]).convert('RGB')
File "/home/cxf/anaconda3/envs/pad/lib/python3.8/site-packages/PIL/Image.py", line 3227, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/river_100_30frame/05251517_0433.MP4/00000.jpg'

@voldemortX
Copy link
Owner

@CHANdaFeng '/home/cxf/bag/CULane/river_100_30frame/05251517_0433.MP4/00000.jpg'这个文件是存在的?

@CHANdaFeng
Copy link
Author

@voldemortX 好像是不存在的,我把driver_100_30frame 文件夹名称改成对应的river_100_30frame,好像可以了

@voldemortX
Copy link
Owner

@voldemortX 好像是不存在的,我把driver_100_30frame 文件夹名称改成对应的river_100_30frame,好像可以了

那可能是你的文件夹名字和默认的不太一致

@CHANdaFeng
Copy link
Author

@voldemortX 好的, 不过有点疑问,这个数据集是从官网下载的CULane,我修改之后,现在运行
python main_landet.py --test --config=configs/lane_detection/bezierlanenet/resnet18_culane_aug1b.py --checkpoint=checkpoints/resnet18_bezierlanenet_culane-aug2/model.pt
运行到一半之后就中止了,
提示FileNotFoundError: [Errno 2] No such file or directory: '/home/cxf/bag/CULane/river_193_90frame/06051317_0673.MP4/00180.jpg'

@voldemortX
Copy link
Owner

@CHANdaFeng 建议检查一下数据集有没有损坏修改或缺失。因为你如果是从官网下的,不会有river这个文件夹名

@CHANdaFeng
Copy link
Author

@voldemortX 我查看了一下,数据是没有损坏的,不过按道理不应该是查找driver_100_30frame吗, 还是说代码我不小心修改了 删掉d了

@voldemortX
Copy link
Owner

@CHANdaFeng 代码和写路径的txt都看看,全局搜索一下

@CHANdaFeng
Copy link
Author

@voldemortX hello,我在运行Lane points (Image Folder) 可视化的时候,我看历程是需要label标签的, 我训练结束后 生成的只有相关类似00000.lines 的txt文件, 没有看到label标签,请问一下是什么原因呀
官网
python tools/vis/lane_img_dir.py --image-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --keypoint-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --mask-path=PAD_test_images/lane_test_images/laneseg_label_w16/05171008_0748.MP4 --image-suffix=.jpg --keypoint-suffix=.lines.txt --mask-suffix=.png --save-path=PAD_test_images/lane_test_images/culane_res --config=
目前我使用resnet18_culane_aug1b训练culane数据集后没有相关label文件

@voldemortX
Copy link
Owner

@voldemortX hello,我在运行Lane points (Image Folder) 可视化的时候,我看历程是需要label标签的, 我训练结束后 生成的只有相关类似00000.lines 的txt文件, 没有看到label标签,请问一下是什么原因呀
官网
python tools/vis/lane_img_dir.py --image-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --keypoint-path=PAD_test_images/lane_test_images/05171008_0748.MP4 --mask-path=PAD_test_images/lane_test_images/laneseg_label_w16/05171008_0748.MP4 --image-suffix=.jpg --keypoint-suffix=.lines.txt --mask-suffix=.png --save-path=PAD_test_images/lane_test_images/culane_res --config=
目前我使用resnet18_culane_aug1b训练culane数据集后没有相关label文件

这里没写需要label吧

@voldemortX
Copy link
Owner

@CHANdaFeng 你下载测试数据包PAD_test_images了吗,可以根据例子看看具体都是什么输入格式

@Durobert
Copy link

Durobert commented Aug 2, 2023

我已经按照CULane Dataset中下载对应的数据集并修改,下载的时候没有看到bezier_labels/train_3.json'文件?

这个问题你是怎么解决的,我也碰到这个问题了

@voldemortX
Copy link
Owner

voldemortX commented Aug 3, 2023

@Durobert You can find them in datasets/CULane.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants