-
-
Notifications
You must be signed in to change notification settings - Fork 63
"valgrind: the 'impossible' happened: Killed by fatal signal" on macOS 10.13 #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @EricBrunel, This is going to be a bit challenging to debug as I don't have access to a macOS 10.13 machine. However, here are a few steps we can try to see what's going on in Valgrind.
Reminder for myself: dyld's mmap looks very normal/simple |
Hello Louis and thanks for your answer. I cloned the repository and built valgrind myself, and I ran the built version on the wish interpreter. Here is its output:
The part after "Thread 1" actually changes across runs. Sometimes it's one thing, sometimes another. Another thing I had to notice is that if I redirect the output to a file, valgrind has a tendency to start looping, printing out:
over and over and over again, and I have to kill it with I did run valgrind within lldb, but I'm not sure the results will be of any help. Here is what I got:
Considering the issue with valgrind looping when I redirect its output, I didn't dare to run it with the multiple |
That's very odd, you get two different issues, depending how you build Valgrind? In your first message, it seemed like an error during mmap and now it looks like the TLS pointer not being set. Is that consistent if you re run each multiple times? (it seems like it from the LLDB output but just too make sure)
If you run with just
It shouldn't generate anything close to 1GB of logs 😄 so it shouldn't be a problem but we can look into this signal issue first if you prefer. |
This could well be related to the changes that I made for ELF systems to handle multiple RW sections. I had to get the mach-o code to do something similar, but it was a quick and dirty job. I just opened a bugzilla item for this https://bugs.kde.org/show_bug.cgi?id=501194 I'll have a go at fixing it this weekend. |
The macho loading should be fixed. I can run wish and get to the %wish prompt That's probably not much help as it looks like the process forks. I get many errors with --trace-children=yes. |
I do indeed get different errors depending on whether I run the version of valgrind installed with Homebrew and the version I built myself. As far as I could see, the error with the Homebrew version is always the one with mmap. For the version I built myself, I'm less sure it's consistent, since as I said, the behavior seems to be a bit random: sometimes I'm getting a short log, sometimes this "Signal 11 being dropped from thread 0's queue" message repeated over and over. Each time I did pay attention, there was the error message " Anyway, I did run the version I built myself with
Not sure if that's any help... |
Thanks @paulfloyd! I am still struggling to get upstream merge into
At least that's consistent, good to know.
Ah, could you try it a few times until you get the |
If the xmllint failures during build are the problem that has been fixed by Mark Wielaard: https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=9f956db3e5eb0afb0d60987f3658b66646a0ac81 commit 9f956db
|
Not only, a bunch of merge artifacts which git didn't handle correctly (I guess things break down after 200+ conflicts). Anyway, it's now in |
@EricBrunel Could you give it another try after pulling the latest changes from the main branch? |
Sorry for the delay. I did test the latest changes on the main branch by cloning the repo, and I've attached the logs I'm getting. The first one ( valgrind20250312a.log |
It's not serious but I need to look at why it's saying "can't open file to inspect ELF header" and not macho. I'll give this a go with the upstream code on a macOS 10.13 VM. Things seem to be going wrong from a call to pipe(). |
The difference of error is most likely due to a race condition but does seem to indicate there are multiple issues at play:
I am unaware of any issue with I will look into those on a later date.
Because that's the hardcoded message in |
10.13 doesn;t use DSC (or at least it was still possible to request to use files rather than DSC). I think that 10.15 was when the option to use files was removed, making DSC mandatory. I'll fix the message soon. |
I am well aware but this is only because Valgrind itself tells dyld not to use it, not because it's not available. DSC has been around since iOS 3.1 (2010) and it's clearly supported in the dyld version of 10.13. The question is why would those files not be available on 10.13 even though they were only removed in 10.15/11 as you said yourself. |
I'll take a look this evening. |
With the upstream code in a 10.13 VM I can run "vg-in-place wish" (which runs /usr/bin/wish). I get to the wish prompt and en empty wish graphical window. "vg-in-place --trace=children=yes wish" gives me an execve failure for /usr/bin/dirname With main from this repo af822ef, again I get to the wish prompt without tracing children. I get a lot further - to the UNKNOWN mach_msgs when tracing children. The files generating the ELF error message do exist. It's quite possible that there is an issue reading universal binaries. For instance file /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/Print.framework/Versions/A/Print I need to do some more debugging and poking around with otool. |
Thanks for looking into this @paulfloyd. Given that @EricBrunel Could you run valgrind again with all the previous flags and add |
Output of |
There is an error in the code from the change that I made that switched from reading a 4k block to passing the fd to ML_(check_macho_and_get_rw_loads) which now allocates enough for the segment commands. The close is only in the non-Darwin code block. I'll fix that shortly. |
Pushed a fix. I also need the following small change static Bool check_fat_macho_and_get_rw_loads(const void* macho_header, Int* rw_loads)
(rw_map_count can now be zero) Then I get ==8227== Nulgrind, the minimal Valgrind tool valgrind: m_syswrap/syswrap-amd64-darwin.c:517 (void wqthread_hijack(Addr, Addr, Addr, Addr, UInt, Int, Addr)): Assertion 'tst->os_state.pthread - magic_delta == self' failed. |
Thanks a lot @paulfloyd for looking into that. @EricBrunel with all the changes merged on main, your program should run fine now (it works for me on amd64 15.0). Feel free to report if you are still seeing issues. |
Context
I'm basically trying to debug the latest version of tcl/tk (9.0.1), compiled from the source code on an Intel iMac with macOS 10.13. I got valgrind from Homebrew and tried to run it on the tk interpreter (wish9.0).
What went wrong?
When I try to run valgrind on wish9.0, it gets stuck for a few seconds, then prints out the following error message:
valgrind does work on a trivial program (e.g,
printf("Hello world")
); it also does work on the wish interpreter included in macOS:valgrind /usr/bin/wish
does not crash with the message above, but actually runs the interpreter.The tested interpreter was built on the mac where I try to run valgrind by myself a few days ago. Needless to say, it does work without problem when I run it outside of valgrind.
Information
uname -m
): x86_64sw_vers
): 10.13.6xcrun --sdk macosx --show-sdk-version
): 10.14The text was updated successfully, but these errors were encountered: