-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
odd cleanup behavior under valgrind #4363
Comments
On one hand, this has to be related to But on the other hand, this can't be related to Perhaps there are some |
Some further investigation. I was able to locally reproduce using the CI docker image. The other thing killing the process is actually valgrind. the exit code is simply the argument that we pass to valgrind: My current understanding is that my PR did not introduce a memory leak, rather it was some non functional change that made valgrind able to detect the memory leak. Additionally, If the rest of the tests cases are removed then a memory leak is not reported in |
My earlier understanding was correct. s2n-tls/tests/unit/valgrind.suppressions Lines 35 to 38 in 7f84701
This is a wildcard suppression that suppresses all reachable leaks where the call stack starts with a function named The reason that the errors in my PR weren't suppressed was because the call stack happened to be too long. We run valgrind with This stack trace was then long enough that it didn't match the
wildcard suppression, because
We see that the suggested suppression starts with |
Problem:
An odd failure was seen in
s2n_examples_test
when developing #4351The failure was seen using
This failure was also seen using gcc 9
I was not able to replicate this failure with different libcrypto's, and I was not able to replicate this failure when using clang 15 and openssl 1.1.1 on an ARM AL2023 instance.
The stack trace was
Curiously, the actual unit test also failed (in addition to the reported memory leak)
It is common to see that failing unit tests will report false positive memory leaks, and once the actual reason for the failure is fixed the memory leak will go away. My understanding here is that that is the case.
This test forks a client and server process, does a handshake over local_io, and then shuts down. I added the following logs to get a bit more of an idea what is going on
This is the code that the orchestrating processes uses the reap the childresn
This is the logging code in the client.
While the logs are a little cluttered, we can see that nothing seems to be going wrong in the test, but something goes wrong with the client tries to exit which results in an unsuccessful process exit, causing the test to fail.
We can get more information from the exist status of
2304
It's rather hard to interpret these symbols, but it looks like something is forcibly exiting the process, possibly with a SIGKILL (exit status 9?)
This seems to be related to DEFER_CLEANUP on process exit. This commit fixed the valgrind failure: 3c624d7
Solution:
We should be able to understand exactly why this test was failing under valgrind, and exactly what was happening.
Requirements / Acceptance Criteria:
The text was updated successfully, but these errors were encountered: