-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Describe the bug
I have a function that initializes a variable using ti.math.vec3
. It works fine 99% of the time, but I recently made a change to my program and called that function from a new kernel. It behaved as expected when running on CPU and Metal, but when I run it on CUDA, it fails with CUDA_ERROR_MISALIGNED_ADDRESS
. I can fix the error by initializing the variable implicitly (using another vec3 object).
I am unsure why it only fails when I call from a specific kernel. I have rebooted my machine, cleared Taichi cache, disabled Taichi offline cache, cleaned python cache, enabled debug=True
, and added ti.sync()
between almost every line in the problematic kernel. None of those changes fixed anything.
To Reproduce
Unfortunately, I can't post all of my code as it is part of a commercial project. I have included snippets that should provide some example of the issue. I don't have time to replicate the issue in more simply standalone code at the moment, sorry.
# Non-problematic kernel - error does not occur when nearest_surface is called
@ti.kernel
def snap_fit(self, indx: ti.i32, cellId: ti.i32): # type: ignore
p, _, _ = nearest_surface(cellId, self.pts[indx], self.surf, 0)
snapVector = self.pts[indx] - p # Line to be filled with points
# Reposition points
for i in range(indx):
self.pts[i] = p + ((i/(indx)) * snapVector)
# Problematic kernel - error does occur when nearest_surface is called
@ti.kernel
def surface_snap_fit(self, cellId: ti.i32, p: ti.math.vec3): # type: ignore
# First store results in local variables
closest_pt, _, cell_id = nearest_surface(cellId, p, self.surf, 0)
# Then update fields separately
self.pts[0] = closest_pt
self.root_surf_cell[0] = cell_id
# Problematic function
@ti.func
def nearest_surface(cellId: ti.i32, p: ti.math.vec3, surf, arcPt: ti.i32): # surf is a ti.data_oriented object
# Initialize variables
max_neighbors = surf.maxNeighborCells[0]
minDistSqr = 1e23
closestPoint = ti.math.vec3(0.0, 0.0, 0.0) # The error occurs here
# Function continues...
# Fixed function
@ti.func
def nearest_surface(cellId: ti.i32, p: ti.math.vec3, surf, arcPt: ti.i32): # surf is a ti.data_oriented object
# Initialize variables
max_neighbors = surf.maxNeighborCells[0]
minDistSqr = 1e23
closestPoint = p # Initializing using the vector 'p' solves this issue
# Function continues...
Log/Screenshots
Taichi initialized withdebug=False
, since this allowed debug print
statements to be used
<program operating normally, until function call from problematic kernel>
[E 03/11/25 15:26:55.272 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - WARNING - Simulation 0 failed! Exception: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - INFO - Launching simulation 1. Rank 0 50.0% complete
[E 03/11/25 15:26:55.278 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling malloc_async_impl (cuMemAllocAsync)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - WARNING - Arc 1 failed! Exception: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling malloc_async_impl (cuMemAllocAsync)
2025-03-11 15:26:55,278 - Rank 0 - program_rank0 - INFO - Rank 0 executions complete!
[E 03/11/25 15:26:55.299 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling module_load_data_ex (cuModuleLoadDataEx)
Traceback (most recent call last):
File "/home/naj20/program_Taichi/program.py", line 195, in <module>
local_prob = surface.prob.to_numpy()
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/util.py", line 351, in wrapped
return func(*args, **kwargs)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/field.py", line 307, in to_numpy
tensor_to_ext_arr(self, arr)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 1113, in wrapped
return primal(*args, **kwargs)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 1045, in __call__
return self.launch_kernel(kernel_cpp, *args)
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 976, in launch_kernel
raise e from None
File "/home/naj20/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 971, in launch_kernel
prog.launch_kernel(compiled_kernel_data, launch_ctx)
RuntimeError: [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling module_load_data_ex (cuModuleLoadDataEx)
[E 03/11/25 15:26:55.330 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling stream_synchronize (cuStreamSynchronize)
[E 03/11/25 15:26:55.330 6155] [cuda_driver.h:operator()@92] CUDA Error CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address while calling mem_free (cuMemFree_v2)
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
[pop-os:06155] *** Process received signal ***
[pop-os:06155] Signal: Aborted (6)
[pop-os:06155] Signal code: (-6)
[pop-os:06155] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7cd905842520]
[pop-os:06155] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7cd9058969fc]
[pop-os:06155] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7cd905842476]
[pop-os:06155] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7cd9058287f3]
[pop-os:06155] [ 4] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4957daf)[0x7cd8a5b57daf]
[pop-os:06155] [ 5] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4956426)[0x7cd8a5b56426]
[pop-os:06155] [ 6] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x4956491)[0x7cd8a5b56491]
[pop-os:06155] [ 7] /home/naj20/.local/lib/python3.10/site-packages/taichi/_lib/core/taichi_python.cpython-310-x86_64-linux-gnu.so(+0x19563bb)[0x7cd8a2b563bb]
[pop-os:06155] *** End of error message ***
Aborted (core dumped)
Additional comments
Here is the output of ti.diagnose
[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
*******************************************
** Taichi Programming Language **
*******************************************
Docs: https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum: https://forum.taichi.graphics/
Taichi system diagnose:
python: 3.10.12 (main, Nov 6 2024, 20:22:13) [GCC 11.4.0]
system: linux
executable: /usr/bin/python3
platform: Linux-6.8.0-76060800daily20240311-generic-x86_64-with-glibc2.35
architecture: 64bit ELF
uname: uname_result(system='Linux', node='pop-os', release='6.8.0-76060800daily20240311-generic', version='#202403110203~1715181801~22.04~aba43ee SMP PREEMPT_DYNAMIC Wed M', machine='x86_64')
locale: en_US.UTF-8
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/bin
PYTHONPATH: ['/usr/local/bin', '/home/naj20/BWB/2_SWEEPZ_Sims/P0', '/usr/local/bin', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/home/naj20/.local/lib/python3.10/site-packages', '/usr/local/lib/python3.10/dist-packages', '/usr/lib/python3/dist-packages']
No LSB modules are available.
Distributor ID: Pop
Description: Pop!_OS 22.04 LTS
Release: 22.04
Codename: jammy
import: <module 'taichi' from '/home/naj20/.local/lib/python3.10/site-packages/taichi/__init__.py'>
cpu: True
metal: False
opengl: True
cuda: True
vulkan: True
`glewinfo` not available: [Errno 2] No such file or directory: 'glewinfo'
Tue Mar 11 15:30:46 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67 Driver Version: 550.67 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-PCIE-16GB Off | 00000000:01:00.0 Off | 0 |
| N/A 34C P0 37W / 250W | 1MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce GTX 1080 Off | 00000000:05:00.0 Off | N/A |
| 27% 29C P8 5W / 180W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
[Taichi] Starting on arch=x64
[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
[Taichi] Starting on arch=opengl
[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
[Taichi] Starting on arch=cuda
[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.10.12
*******************************************
** Taichi Programming Language **
*******************************************
Docs: https://docs.taichi-lang.org/
GitHub: https://github.com/taichi-dev/taichi/
Forum: https://forum.taichi.graphics/
TAICHI EXAMPLES
────────────────────────────────────────────────────────────────────────────────────
0: ad_gravity 25: karman_vortex_street 50: patterns
1: circle_packing_image 26: keyboard 51: pbf2d
2: comet 27: laplace 52: physarum
3: cornell_box 28: laplace_equation 53: poisson_disk_sampling
4: diff_sph 29: mandelbrot_zoom 54: print_offset
5: differential_evolution 30: marching_squares 55: rasterizer
6: euler 31: mass_spring_3d_ggui 56: regression
7: eulerfluid2d 32: mass_spring_game 57: sdf_renderer
8: explicit_activation 33: mass_spring_game_ggui 58: simple_derivative
9: export_mesh 34: mciso_advanced 59: simple_texture
10: export_ply 35: mgpcg 60: simple_uv
11: export_videos 36: mgpcg_advanced 61: snow_phaseField
12: fem128 37: minimal 62: stable_fluid
13: fem128_ggui 38: minimization 63: stable_fluid_ggui
14: fem99 39: mpm128 64: stable_fluid_graph
15: fractal 40: mpm128_ggui 65: taichi_bitmasked
16: fractal3d_ggui 41: mpm3d 66: taichi_dynamic
17: fullscreen 42: mpm3d_ggui 67: taichi_logo
18: game_of_life 43: mpm88 68: taichi_ngp
19: gui_image_io 44: mpm88_graph 69: taichi_sparse
20: gui_widgets 45: mpm99 70: texture_graph
21: implicit_fem 46: mpm_lagrangian_forces 71: tutorial
22: implicit_mass_spring 47: nbody 72: two_stream_instability
23: initial_value_problem 48: odop_solar 73: vortex_rings
24: jacobian 49: oit_renderer 74: waterwave
────────────────────────────────────────────────────────────────────────────────────
42
Running example minimal ...
[Taichi] Starting on arch=x64
42.0
>>> Running time: 0.10s
Consider attaching this log when maintainers ask about system information.
>>> Running time: 2.81s
Metadata
Metadata
Assignees
Labels
Type
Projects
Status