-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent SEGFAULT
s on consecutive exec_command()
invocations
#658
base: devel
Are you sure you want to change the base?
Conversation
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
6d6711a
to
b94734d
Compare
Fixes ansible#57 Signed-off-by: Jakub Jelen <[email protected]>
Signed-off-by: Jakub Jelen <[email protected]>
Signed-off-by: Jakub Jelen <[email protected]>
Quality Gate passedIssues Measures |
strict=False, | ||
) | ||
def exec_second_command(ssh_channel): | ||
"""Call exec_command() from different context in the call stack.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling from a different context is a part of the motivation. Within this function, there's nothing that changes the context. You're probably referring to the calling test but this is not something that would go into the docstring because it's not what the function does, it's what that test does, probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Call exec_command() from different context in the call stack.""" | |
"""Return ``exec_command()`` stdout as a text string.""" |
cb_size = sizeof(callbacks.ssh_channel_callbacks_struct) | ||
cdef callbacks.ssh_channel_callbacks_struct *cb = <callbacks.ssh_channel_callbacks_struct *>PyMem_Malloc(cb_size) | ||
if cb is NULL: | ||
raise LibsshChannelException("Memory allocation error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov says this line is uncovered: https://app.codecov.io/gh/ansible/pylibssh/pull/658#17ffa324129dc304109f4a66c79769d7-R173.
Could you add a test covering this newly added line so that all the new lines in the patch are covered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a simple way to make the malloc fail to add to the test coverage?
u_cmd_out = ssh_channel.exec_command('echo -n Hello Again').stdout.decode() | ||
assert u_cmd_out == u'Hello Again' # noqa: WPS302 | ||
|
||
# randomize the stack a bit more by calling this from yet another function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate enough?
# randomize the stack a bit more by calling this from yet another function | |
# NOTE: Call `exec_command()` once again from another function to | |
# NOTE: force it to happen in another place of the call stack, | |
# NOTE: making sure that the context is different from one in this | |
# NOTE: this test function. The resulting call stack will end up | |
# NOTE: being more random. |
|
||
libssh.ssh_channel_send_eof(channel) | ||
result.returncode = libssh.ssh_channel_get_exit_status(channel) | ||
if channel is not NULL: | ||
libssh.ssh_channel_close(channel) | ||
libssh.ssh_channel_free(channel) | ||
|
||
# XXX leaking the memory of the callbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what this means? Is this a FIXME? Does it have to remain in the patch? Why not use a # FIXME:
comment instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a side-effect that might be considered as a bug, I would prefer to keep a reference to that. Other option might be to keep some list of callbacks on the session layer to free when we free session, but I wanted first PoC to confirm this approach was working. This will be an issue for older libssh versions before we will fix it with the following change in libssh: https://gitlab.com/libssh/libssh-mirror/-/merge_requests/549
Given that I see this works, I can probably implement more cleaner approach without memory leaks, but probably only tomorrow.
SEGFAULT
s on consecutive exec_command()
invocations
@Jakuje it looks like this is making some CI jobs get stuck: https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210235668?pr=658 / https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210009382?pr=658. |
I restarted said jobs, but this is something to look into, as it'll probably make the CI flakier if merged. |
@Jakuje the rawhide test failure log is similar to those in GHA: https://download.copr.fedorainfracloud.org/results/packit/ansible-pylibssh-658/fedora-rawhide-x86_64/08279621-python-ansible-pylibssh/builder-live.log.gz. |
OTOH, it's also unstable on |
SUMMARY
The function
exec_command()
keeps the callbacks as a local variable before assigning them to the created channel. The channel is not guaranteed to be completely freed whenssh_channel_free()
is called because there might be some leftover messages or responses to process (close confirmation, exit code ...).Calling the
exec_command()
as done previously in the test from the same function without anything in between (except assert) will likely map the second function call to the same memory on the call stack so it was working most of the time. But calling it from different functions or contexts will likely change the call stack and processing of outstanding callbacks is more likely to result in addressing wrong memory location.Likely fixes #57, #645 and #657
ISSUE TYPE
ADDITIONAL INFORMATION
I was not able to reproduce the issue locally so pushing to see if the CI will be able to crash.
This is also introducing memory leaks as the callback structure is never freed. We should probably store it somewhere in the python code before returning to make sure it is not garbage collected (or can the python GC track the callback pointer is still stored on the libssh side?).