Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Internal compiler error with MSVC 2022 when building nb_func.h #613

Closed
jhale opened this issue Jun 7, 2024 · 35 comments
Closed

[BUG]: Internal compiler error with MSVC 2022 when building nb_func.h #613

jhale opened this issue Jun 7, 2024 · 35 comments

Comments

@jhale
Copy link

jhale commented Jun 7, 2024

Problem description

We have recently started building our library Basix on Windows using MSVC 2022 using GitHub Actions. We are using scikit-build-core to orchestrate the CMake build and we build nanobind statically. We use vcpkg to bring in LAPACK and BLAS.

After switching to nanobind 2.0.0 we are consistently triggering an internal compiler error when building nanobind statically. The error points to L42 in nb_func.cpp, but that line looks relatively innocent.

https://github.com/FEniCS/basix/actions/runs/9404421938/job/25903298628#step:5:763
https://github.com/wjakob/nanobind/blob/master/src/nb_func.cpp#L42

We don't have Windows machines locally so we haven't tried reproducing this locally. Your CI seems to be running green on MSVC 2022. Any ideas?

Reproducible example code

No response

@miklhh
Copy link

miklhh commented Jun 7, 2024

We have been getting the same internal compiler error pointing to L42 in nb_func.cpp since about a day ago.

@wjakob
Copy link
Owner

wjakob commented Jun 7, 2024

That's quite mysterious since it doesn't seem to happen in the nanobind CI. Two requests:

  1. Could you please open an issue with Microsoft? The true source of these issues is never fixed unless escalated to the compiler vendor.
  2. The failure is either happening because of a change in the compiler (e.g. update) or because of a change in nanobind. Do you know which one it is? If the latter, can you bisect it to a specific commit? (That said, even in that case, the issue should be reported to Microsoft)

@miklhh
Copy link

miklhh commented Jun 7, 2024

Regarding your second point @wjakob

We are using the 2.0.0 tag point release from wjacob/nanobind. We started noticing this error roughly from yesterday. I think it's safe to say that something upstream, either in the MSVC compiler or in the GitHub Windows Server runner image, has changed.

Possibly related issue: actions/runner-images#10004

@miklhh
Copy link

miklhh commented Jun 7, 2024

@wjakob, is it possible for you to trigger a windows-latest build job for your nanobind Tests job? From looking around a bit over at actions/runner-images, there seems to be a bit of talk going on that something has happened to that image within the last 24 hours. I'm curious if your test build can trigger this internal compiler error.

@nicholasjng
Copy link
Contributor

nicholasjng commented Jun 7, 2024

To throw another datapoint into the mix:

I maintain nanobind-bazel, a Bazel ruleset for Python bindings builds with nanobind, and my CI (which consists of building @wjakob 's nanobind_example across multiple Python versions with the respective installed C++ toolchain on every OS) is still entirely green on 20240603.1.0: https://github.com/nicholasjng/nanobind-bazel/actions/runs/9416682418/job/25940329652#step:1:9

EDIT: I should also say that this is built with nanobind v2.0.0.

@jhale
Copy link
Author

jhale commented Jun 7, 2024

Tried on a fork https://github.com/jhale/nanobind, doesn't trigger the issue on either a slightly tweaked v2.0.0 or a current main.

@jhale
Copy link
Author

jhale commented Jun 7, 2024

Could you tell us more about your project @miklhh ?

@metab0t
Copy link
Contributor

metab0t commented Jun 9, 2024

https://developercommunity.visualstudio.com/t/Internal-compiler-error-w-nanobind-v19/10662883?sort=newest

@jhale The issue has been reported and fixed, but they haven't released the patch.

The bug is introduced in MSVC 19.40.xxxxx

@jhale
Copy link
Author

jhale commented Jun 10, 2024

Thanks @metab0t - Even though this is not an issue with nanobind itself, I propose we leave this issue open until the fix is rolled out @wjakob?

@metab0t
Copy link
Contributor

metab0t commented Jun 11, 2024

@jhale I found out that windows-2019 image can be used instead of windows-latest as a temporary solution to compile nanobind.

@henryiii
Copy link
Contributor

Also see pypa/cibuildwheel#1875 for another.

@Yikai-Liao
Copy link

It's quite a strange bug. Since I could build my lib successfully on my win10 laptop with vs2022. But on github action, it just failed. I don't quite understand what's the difference.

@henryiii
Copy link
Contributor

henryiii commented Jun 16, 2024

Do you have version 17.10 locally? That’s the buggy one that came to GHA a week or two ago. It’s breaking other things too, I’ve seen three unrelated packages broken so far. Older versions are fine.

(edit: sorry, got the first number wrong, fixed it)

@Yikai-Liao
Copy link

Yikai-Liao commented Jun 16, 2024

I got Microsoft Visual Studio Community 2022 (64-bit) - Current Version 17.9.5 locally.

Do you mean the newest 17.10 version?

@henryiii
Copy link
Contributor

Yes, got the first number wrong. The buggy one is 17.10. 17.9 is fine.

@henryiii
Copy link
Contributor

(If you check the link above, apparently it has been fixed internally, but isn’t in a released version yet, as of 17.10.2)

@melectronvolt
Copy link

melectronvolt commented Jun 30, 2024

@Yikai-Liao
Copy link

Yikai-Liao commented Jul 11, 2024

I found a solution to make nanobind compile on the windows-2022 image on github actions -- using clang-cl. (because Visual Studio 2019 does not support c++20 standard well, I have to use the windows-2022 image)

Although the bug is still not fixed in Visual Studio 17.10.4, the clang-cl in Visual Studio does not have this bug.
So I just add a new CMakeLists-VS.txt file and set(CMAKE_GENERATOR_TOOLSET "ClangCl") before calling project.

cmake_minimum_required(VERSION 3.20)
set(CMAKE_GENERATOR_TOOLSET "ClangCl") # newly added
project(symusic)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

And in the github actions' workflow, I add the following step to specify which CMakeLists to use:

      - name: Force ClangCL on Windows
        if: ${{ matrix.buildplat[0] == 'windows-2022' }}
        # remove the default CMakeLists.txt and replace it with CMakeLists-VS.txt
        run: |
          rm CMakeLists.txt
          mv CMakeLists-VS.txt CMakeLists.txt

I know this is not an elegant and general way, but this does make my github actions work. Now I can precompile the new version of my lib and upload them to pypi.

@calcmogul
Copy link
Contributor

calcmogul commented Jul 11, 2024

If you wrap the CMAKE_GENERATOR_TOOLSET set in if(WIN32), you can avoid needing to overwrite CMakeLists.txt:

# https://github.com/wjakob/nanobind/issues/613
#
# Visual Studio 17.10.4's MSVC compiler gives an ICE in nanobind's nb_func.h,
# but clang-cl does not.
if(WIN32)
    set(CMAKE_GENERATOR_TOOLSET "ClangCl")
endif()

clang-cl is compatible with MSVC's commandline, so flags in if(MSVC) blocks work. The only flag I use that wasn't recognized was /Zf.

@Yikai-Liao
Copy link

If you wrap the CMAKE_GENERATOR_TOOLSET variable in if(WIN32), you can avoid needing to overwrite CMakeLists.txt:

I got it. However, because CMakeLists are packaged with the source code and uploaded to pypi, this can cause the installation to fail when installing from source for users on Win who do not have an additional LLVM installation for Visual Studio.

And I found that when using ninja, set(CMAKE_GENERATOR_TOOLSET "ClangCl") will make cmake fail.

So if we write these settings to the same file, more if is needed.

@henryiii
Copy link
Contributor

I'd do things like this from the command line, -DCMAKE_GENERATOR_TOOLSET=ClangCL, or make a toolset file. You aren't supposed to hard code situation specific things like this in CMakeLists.txt.

@henryiii
Copy link
Contributor

I take it this is still not fixed? It was supposed to be in the latest images...

@Yikai-Liao
Copy link

Yikai-Liao commented Jul 11, 2024

I'd do things like this from the command line, -DCMAKE_GENERATOR_TOOLSET=ClangCL, or make a toolset file. You aren't supposed to hard code situation specific things like this in CMakeLists.txt.

Well, I'm using scikit build, and the only way I find (maybe there are other ways) to pass cmake arguments (out of CMakeLists.txt) is to put them in the pyproject.toml, like this:

[tool.scikit-build.cmake.define]
CMAKE_GENERATOR_TOOLSET="ClangCL"

I also don't like hard code things like this, but I only find this work. So I choose to create a temporary file, and it will be removed once Microsoft fix that bug.

I take it this is still not fixed? It was supposed to be in the latest images...

Yes, in the latest 17.10.4, the bug is still there, waiting to be fixed.

@henryiii
Copy link
Contributor

Config settings lets you do this, pip install -Ccmake.define.CMAKE_GENERATOR_TOOLSET=ClangCL . Or if you are using cibuildwheel, that can set config-settings too. You can also use environment variables, such as:

[tool.scikit-build.cmake.define]
CMAKE_GENERATOR_TOOLSET = { env="CMAKE_GENERATOR_TOOLSET" }

Now you can set CMAKE_GENERATOR_TOOLSET as an environment variable, and it will get passed in as a define.

@jhale
Copy link
Author

jhale commented Jul 12, 2024

Thanks for the tips on clang-cl!

Unfortunately the main purpose of our Windows CI testing is to make sure that our packages are buildable within conga-forge, which uses cl as standard, so this isn't really a suitable workaround for us.

@jdumas
Copy link

jdumas commented Jul 13, 2024

Hi. I also noticed this issue on my own project. I notice that if I add the /permissive- to nanobind's CMake I am also able to reproduce the issue locally on nanobind's repo itself:

diff --git i/CMakeLists.txt w/CMakeLists.txt
index 82f9b28..7d9a03e 100644
--- i/CMakeLists.txt
+++ w/CMakeLists.txt
@@ -22,6 +22,8 @@ if (NOT MSVC)
   option(NB_TEST_SANITZE    "Build tests with address and undefined behavior sanitizers?" OFF)
 endif()

+add_compile_options("/permissive-")
+
 # ---------------------------------------------------------------------------
 # Do a release build if nothing was specified
 # ---------------------------------------------------------------------------

Which result in the following error:

  inter_module.vcxproj -> C:\Users\jedumas\workspace\github\nanobind\build\tests\Debug\inter_module.dll
C:\Users\jedumas\workspace\github\nanobind\include\nanobind\nb_func.h(42,1): fatal  error C1001: Internal compiler error. [C:\Users\jedumas\workspace\github\nanobind\build\tests\nanobind-static-abi3.vcxproj]

This with the latest version of Visual Studio 17.10.4 / MSVC 19.40.33812.0.

The good news is I've tried to update to the preview release of the next Visual Studio version (17.11.2+c078802d4), and the error goes away.

arashbm added a commit to mnets/bliss_bind that referenced this issue Jul 17, 2024
MSVC has some issues that need to be resolved for 2:
wjakob/nanobind#613
@Fig1024
Copy link

Fig1024 commented Jul 18, 2024

I just tried to compile with VS 2022 and C++ 20, it gives internal compiler error in nb_func.h due to some template stuff around 'func_create'

However, compiling with C++17 works without error

@melpon
Copy link

melpon commented Jul 19, 2024

I have applied the following patch, which resolves the Internal compiler error. I believe this patch will become unnecessary once the MSVC version is updated and applied to GitHub Actions in the future.

diff --git a/include/nanobind/nb_func.h b/include/nanobind/nb_func.h
index e53e1a4..428ca81 100644
--- a/include/nanobind/nb_func.h
+++ b/include/nanobind/nb_func.h
@@ -199,14 +199,24 @@ NB_INLINE PyObject *func_create(Func &&func, Return (*)(Args...),
 
         PyObject *result;
         if constexpr (std::is_void_v<Return>) {
+#if defined(_WIN32)
+            cap->func(static_cast<cast_t<Args>>(in.template get<Is>())...);
+#else
             cap->func(in.template get<Is>().operator cast_t<Args>()...);
+#endif
             result = Py_None;
             Py_INCREF(result);
         } else {
+#if defined(_WIN32)
+            result = cast_out::from_cpp(
+                       cap->func(static_cast<cast_t<Args>>(in.template get<Is>())...),
+                       policy, cleanup).ptr();
+#else
             result = cast_out::from_cpp(
                        cap->func((in.template get<Is>())
                                      .operator cast_t<Args>()...),
                        policy, cleanup).ptr();
+#endif
         }
 
         if constexpr (Info::keep_alive)

jhale added a commit to jhale/nanobind that referenced this issue Jul 22, 2024
@jhale
Copy link
Author

jhale commented Jul 22, 2024

A fork of nanobind with the above patch applied to v2.0.0 is available at: https://github.com/jhale/nanobind/tree/jhale/msvc2022-workaround

@wjakob
Copy link
Owner

wjakob commented Jul 24, 2024

@melpon @jhale I'm wondering if it makes sense to temporarily merge this patch into the master branch. Does GHA still serve the broken version of MSVC?

@calcmogul
Copy link
Contributor

calcmogul commented Jul 24, 2024

Yea, MSVC from the current windows-2022 image still gives an ICE: https://github.com/calcmogul/Sleipnir/actions/runs/10070019011/job/27837879240

-- Building for: Visual Studio 17 2022
-- The CXX compiler identification is MSVC 19.40.33812.0
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.40.33807/bin/Hostx64/x64/cl.exe - skipped

D:\a\Sleipnir\Sleipnir.py-build-cmake_cache\cp39-cp39-win_amd64_deps\nanobind-src\include\nanobind\nb_func.h(42,1): fatal error C1001: Internal compiler error. [D:\a\Sleipnir\Sleipnir.py-build-cmake_cache\cp39-cp39-win_amd64_jormungandr.vcxproj]
(compiler file 'msc1.cpp', line 1593)

@jhale
Copy link
Author

jhale commented Jul 24, 2024

@melpon @jhale I'm wondering if it makes sense to temporarily merge this patch into the master branch. Does GHA still serve the broken version of MSVC?

It's still broken in GHA. It's up to you on master, I can see pros (it fixes the issue) and cons (best not to workaround compiler bugs). I will stick with patched v2.0.0 in any case.

@wjakob
Copy link
Owner

wjakob commented Jul 24, 2024

I've merged a temporary fix for this issue but plan to remove it once GHA is functional again.

@faberno
Copy link

faberno commented Oct 2, 2024

Currently the workaround causes compile errors for me (described in this discussion post), but when removing it and building nanobind myself, everything works.
Is it possible that the problem was fixed?

@jhale
Copy link
Author

jhale commented Oct 4, 2024

Currently the workaround causes compile errors for me (described in this discussion post), but when removing it and building nanobind myself, everything works. Is it possible that the problem was fixed?

Could you clarify which version of nanobind you are working with? It's 'fixed' on master, yes, and possibly also v2.2.0 which was released yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests