Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dorado basecalling error with RTX 5090 through WSL2 #1247

Open
ruicatxiao opened this issue Feb 9, 2025 · 0 comments
Open

Dorado basecalling error with RTX 5090 through WSL2 #1247

ruicatxiao opened this issue Feb 9, 2025 · 0 comments

Comments

@ruicatxiao
Copy link

ruicatxiao commented Feb 9, 2025

Issue Report

Please describe the issue:

Encountered error during base-calling with Dorado on RTX 5090

Steps to reproduce the issue:

Command executed through WSL2, running Ubuntu24.04LTS

dorado basecaller sup,4mC_5mC,6mA pod5 --device cuda:all > reads.bam

Run environment:

  • Dorado version: 0.9.1
  • Operating system: Windows 11 WSL2, Ubuntu 24.04
  • Hardware (CPUs, Memory, GPUs): 9800X3D / 96GB RAM / RTX 5090
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
  • Source data location (on device or networked drive - NFS, etc.): on device SSD
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
    R10.4.1
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

  • Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)
[2025-02-09 16:55:35.446] [info] Running: "basecaller" "sup,4mC_5mC,6mA" "pod5" "--device" "cuda:all"
[2025-02-09 16:55:35.669] [info]  - downloading [email protected] with httplib
[2025-02-09 16:55:38.870] [info]  - downloading [email protected]_4mC_5mC@v3 with httplib
[2025-02-09 16:55:40.180] [info]  - downloading [email protected]_6mA@v3 with httplib
[2025-02-09 16:55:42.951] [info] > Creating basecall pipeline
[2025-02-09 16:55:43.997] [error] finalise() not called on a HtsFile.
[2025-02-09 16:55:43.999] [error] CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/pyold/c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f42239cc9b7 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f421cf51115 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f4223996958 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #3: void at::native::gpu_kernel_impl<at::native::FillFunctor<c10::Half> >(at::TensorIteratorBase&, at::native::FillFunctor<c10::Half> const&) + 0x9b1 (0x7f42221b7e11 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #4: void at::native::gpu_kernel<at::native::FillFunctor<c10::Half> >(at::TensorIteratorBase&, at::native::FillFunctor<c10::Half> const&) + 0x33b (0x7f42221b863b in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #5: <unknown function> + 0x9216dd5 (0x7f42221aadd5 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #6: at::native::fill_kernel_cuda(at::TensorIterator&, c10::Scalar const&) + 0x20 (0x7f42221abf00 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #7: <unknown function> + 0x49823a3 (0x7f421d9163a3 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #8: <unknown function> + 0xa61c4b3 (0x7f42235b04b3 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #9: at::_ops::fill__Scalar::call(at::Tensor&, c10::Scalar const&) + 0x12c (0x7f421e06792c in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #10: at::native::zero_(at::Tensor&) + 0xa7 (0x7f421d916a67 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #11: <unknown function> + 0xa61b80d (0x7f42235af80d in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #12: at::_ops::zero_::call(at::Tensor&) + 0x129 (0x7f421e4a4499 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #13: at::native::zeros_symint(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x160 (0x7f421dbc09c0 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #14: <unknown function> + 0x588d645 (0x7f421e821645 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #15: at::_ops::zeros::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0xd5 (0x7f421e020715 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #16: <unknown function> + 0x56c4835 (0x7f421e658835 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #17: at::_ops::zeros::call(c10::ArrayRef<c10::SymInt>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>) + 0x1b1 (0x7f421e07b2a1 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #18: at::native::cudnn_rnn::copy_weights_to_flat_buf_views(c10::ArrayRef<at::Tensor>, long, long, long, long, long, long, bool, bool, cudnnDataType_t, c10::TensorOptions const&, bool, bool, bool) + 0x3d0 (0x7f422197e7d0 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #19: at::native::_cudnn_rnn_flatten_weight(c10::ArrayRef<at::Tensor>, long, long, long, long, long, long, bool, bool) + 0x90 (0x7f422197f410 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #20: <unknown function> + 0xa630fe9 (0x7f42235c4fe9 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #21: <unknown function> + 0xa66700f (0x7f42235fb00f in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #22: <unknown function> + 0x52c4fc4 (0x7f421e258fc4 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #23: at::_ops::_cudnn_rnn_flatten_weight::call(c10::ArrayRef<at::Tensor>, long, c10::SymInt, long, c10::SymInt, c10::SymInt, long, bool, bool) + 0x386 (0x7f421e1cb536 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #24: <unknown function> + 0x80b3c06 (0x7f4221047c06 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #25: torch::nn::detail::RNNImplBase<torch::nn::LSTMImpl>::flatten_parameters() + 0x346 (0x7f4221050d26 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #26: void torch::nn::Module::to_impl<c10::Device&, bool&>(c10::Device&, bool&) + 0xd0 (0x7f4220f7a030 in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #27: torch::nn::Module::to(c10::Device, bool) + 0x1c (0x7f4220f7321c in /usr/local/bin/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #28: dorado() [0xb88ffd]
frame #29: dorado() [0xb89597]
frame #30: dorado() [0xb77fd8]
frame #31: dorado() [0xb7870d]
frame #32: dorado() [0xa50a49]
frame #33: dorado() [0x8d6e3b]
frame #34: dorado() [0x89125b]
frame #35: dorado() [0x896a7a]
frame #36: dorado() [0x4fa350]
frame #37: <unknown function> + 0x2a1ca (0x7f4217b9a1ca in /lib/x86_64-linux-gnu/libc.so.6)
frame #38: __libc_start_main + 0x8b (0x7f4217b9a28b in /lib/x86_64-linux-gnu/libc.so.6)
frame #39: dorado() [0x827f4f]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant