Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI failure on Windows #2376

Open
hannobraun opened this issue Jun 7, 2024 · 3 comments
Open

CI failure on Windows #2376

hannobraun opened this issue Jun 7, 2024 · 3 comments
Labels
help wanted Extra attention is needed topic: build Anything relating to the build system. type: bug Something isn't working

Comments

@hannobraun
Copy link
Owner

As of #2375, the CI build is broken on Windows: https://github.com/hannobraun/fornjot/actions/runs/9405591496/job/25907555438?pr=2375

It fails in the testing job, with a panic from the approximation code:

thread 'main' panicked at D:\a\fornjot\fornjot\crates\fj-core\src\algorithms\approx\face.rs:51:25:
Invalid approximation: Distinct points are too close (a: [14.9560665077605, 16.55430442357281, 16.257858341436602], b: [14.9560665077605, 16.55430442357281, 16.2578583414366], distance: 0.000000000000003552713678800501)

The code in question compares a distance against a tolerance value, so my first assumption was, that this is a case of the different platforms having different floating point precision. However, the distance and the approximation value are nowhere close, so I don't think it can be that.

And that's what confounds me about this failure. It looks like a logic error, but I have no idea how such a logic error could occur just on Windows. This doesn't reproduce on my machine (and I don't have a Windows to test), and I don't think trying to solve this using the CI build would be a productive use of my time right now (the slow turnaround times would make it extremely tedious and time-consuming). Therefore, I will disable the Windows CI build for the time being.

I'm tagging this with help wanted Extra attention is needed , as I hope that someone will be able to reproduce this locally sooner or later, and will then have a much easier time tracking this down than I would have.

@hannobraun hannobraun added type: bug Something isn't working help wanted Extra attention is needed topic: build Anything relating to the build system. labels Jun 7, 2024
@hannobraun
Copy link
Owner Author

The Windows part of the CI build has been disabled in #2377.

@Sergi-Ferrez
Copy link

I have been investigating thru the code, and I think that there is actually 2 points in almost equal coords, and for some reason the Windows version has more precision than Linux?, for that reason panics an the others don't?.

On Windows the "holes" model also panics, and trhought testing is caused by both "add hole" functions (trhu and blind).

Maybe it will be interesting to implement a way to view all points and curves in a model for debugging purposes, handling panics.

@hannobraun
Copy link
Owner Author

Thank you for looking into this, @Sergi-Ferrez!

I have been investigating thru the code, and I think that there is actually 2 points in almost equal coords, and for some reason the Windows version has more precision than Linux?, for that reason panics an the others don't?.

That's what I thought too, initially, but there's a problem with that explanation. Here's the code that does the comparison and then panics:

if b.global_form != a.global_form && distance < min_distance
{
panic!(
"Invalid approximation: \
Distinct points are too close \
(a: {:?}, b: {:?}, distance: {distance})",
a.global_form, b.global_form,
);
}

As you can see, it compares against min_distance, which is defined a few lines up:

let min_distance = ValidationConfig::default().distinct_min_distance;

And the value itself is defined here:

distinct_min_distance: Scalar::from_f64(5e-7), // 0.5 µm,

So the minimum distance value below which two points are considered the same (leading to the panic) is 0.0000005. Meaning on Linux and macOS, all distances between points must be larger than that (because we don't see panics there).

But compare that to the actual distance reported in the error message, 0.000000000000003552713678800501. If I count those zeros correctly, that's 8 orders of magnitude smaller.

If the actual distance was close to the minimum distance, I'd believe it's just a difference in precision between platforms, so that one platform ends up above the threshold, the other below it. But the difference is so large, that I find this hard to believe. Which is why I concluded that it must be a logic error.

But all that said, as I said in the issue description, I have not idea how such a logic error could occur just on Windows, so maybe I'm totally wrong in my analysis.

Maybe it will be interesting to implement a way to view all points and curves in a model for debugging purposes, handling panics.

Yeah, that would be awesome. Overall, the tooling we have for debugging issues like that isn't very good, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed topic: build Anything relating to the build system. type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants