Dorado basecalling error with RTX 5090 through WSL2 #1247

ruicatxiao · 2025-02-09T22:00:37Z

Issue Report

Please describe the issue:

Encountered error during base-calling with Dorado on RTX 5090

Steps to reproduce the issue:

Command executed through WSL2, running Ubuntu24.04LTS

dorado basecaller sup,4mC_5mC,6mA pod5 --device cuda:all > reads.bam

Run environment:

Dorado version: 0.9.1
Operating system: Windows 11 WSL2, Ubuntu 24.04
Hardware (CPUs, Memory, GPUs): 9800X3D / 96GB RAM / RTX 5090
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): on device SSD
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
R10.4.1
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

[2025-02-09 16:55:35.446] [info] Running: "basecaller" "sup,4mC_5mC,6mA" "pod5" "--device" "cuda:all"
[2025-02-09 16:55:35.669] [info]  - downloading [email protected] with httplib
[2025-02-09 16:55:38.870] [info]  - downloading [email protected]_4mC_5mC@v3 with httplib
[2025-02-09 16:55:40.180] [info]  - downloading [email protected]_6mA@v3 with httplib
[2025-02-09 16:55:42.951] [info] > Creating basecall pipeline
[2025-02-09 16:55:43.997] [error] finalise() not called on a HtsFile.
[2025-02-09 16:55:43.999] [error] CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/pyold/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f42239cc9b7 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f421cf51115 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f4223996958 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #3: void at::native::gpu_kernel_impl<at::native::FillFunctor<c10::Half> >(at::TensorIteratorBase&, at::native::FillFunctor<c10::Half> const&) + 0x9b1 (0x7f42221b7e11 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #4: void at::native::gpu_kernel<at::native::FillFunctor<c10::Half> >(at::TensorIteratorBase&, at::native::FillFunctor<c10::Half> const&) + 0x33b (0x7f42221b863b in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #5: <unknown function> + 0x9216dd5 (0x7f42221aadd5 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #6: at::native::fill_kernel_cuda(at::TensorIterator&, c10::Scalar const&) + 0x20 (0x7f42221abf00 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #7: <unknown function> + 0x49823a3 (0x7f421d9163a3 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #8: <unknown function> + 0xa61c4b3 (0x7f42235b04b3 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #9: at::_ops::fill__Scalar::call(at::Tensor&, c10::Scalar const&) + 0x12c (0x7f421e06792c in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #10: at::native::zero_(at::Tensor&) + 0xa7 (0x7f421d916a67 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #11: <unknown function> + 0xa61b80d (0x7f42235af80d in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #12: at::_ops::zero_::call(at::Tensor&) + 0x129 (0x7f421e4a4499 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #13: at::native::zeros_symint(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x160 (0x7f421dbc09c0 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #14: <unknown function> + 0x588d645 (0x7f421e821645 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #15: at::_ops::zeros::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0xd5 (0x7f421e020715 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #16: <unknown function> + 0x56c4835 (0x7f421e658835 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #17: at::_ops::zeros::call(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x1b1 (0x7f421e07b2a1 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #18: at::native::cudnn_rnn::copy_weights_to_flat_buf_views(c10::ArrayRef<at::Tensor>, long, long, long, long, long, long, bool, bool, cudnnDataType_t, c10::TensorOptions const&, bool, bool, bool) + 0x3d0 (0x7f422197e7d0 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #19: at::native::_cudnn_rnn_flatten_weight(c10::ArrayRef<at::Tensor>, long, long, long, long, long, long, bool, bool) + 0x90 (0x7f422197f410 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #20: <unknown function> + 0xa630fe9 (0x7f42235c4fe9 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #21: <unknown function> + 0xa66700f (0x7f42235fb00f in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #22: <unknown function> + 0x52c4fc4 (0x7f421e258fc4 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #23: at::_ops::_cudnn_rnn_flatten_weight::call(c10::ArrayRef<at::Tensor>, long, c10::SymInt, long, c10::SymInt, c10::SymInt, long, bool, bool) + 0x386 (0x7f421e1cb536 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #24: <unknown function> + 0x80b3c06 (0x7f4221047c06 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #25: torch::nn::detail::RNNImplBase<torch::nn::LSTMImpl>::flatten_parameters() + 0x346 (0x7f4221050d26 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #26: void torch::nn::Module::to_impl<c10::Device&, bool&>(c10::Device&, bool&) + 0xd0 (0x7f4220f7a030 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #27: torch::nn::Module::to(c10::Device, bool) + 0x1c (0x7f4220f7321c in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #28: dorado() [0xb88ffd]
frame #29: dorado() [0xb89597]
frame #30: dorado() [0xb77fd8]
frame #31: dorado() [0xb7870d]
frame #32: dorado() [0xa50a49]
frame #33: dorado() [0x8d6e3b]
frame #34: dorado() [0x89125b]
frame #35: dorado() [0x896a7a]
frame #36: dorado() [0x4fa350]
frame #37: <unknown function> + 0x2a1ca (0x7f4217b9a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #38: __libc_start_main + 0x8b (0x7f4217b9a28b in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: dorado() [0x827f4f]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dorado basecalling error with RTX 5090 through WSL2 #1247

Dorado basecalling error with RTX 5090 through WSL2 #1247

ruicatxiao commented Feb 9, 2025 •

edited by HalfPhoton

Loading

Dorado basecalling error with RTX 5090 through WSL2 #1247

Dorado basecalling error with RTX 5090 through WSL2 #1247

Comments

ruicatxiao commented Feb 9, 2025 • edited by HalfPhoton Loading

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

Logs

ruicatxiao commented Feb 9, 2025 •

edited by HalfPhoton

Loading