Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Jakuje · 2024-11-14T14:23:18Z

SUMMARY

The function exec_command() keeps the callbacks as a local variable before assigning them to the created channel. The channel is not guaranteed to be completely freed when ssh_channel_free() is called because there might be some leftover messages or responses to process (close confirmation, exit code ...).

Calling the exec_command() as done previously in the test from the same function without anything in between (except assert) will likely map the second function call to the same memory on the call stack so it was working most of the time. But calling it from different functions or contexts will likely change the call stack and processing of outstanding callbacks is more likely to result in addressing wrong memory location.

Likely fixes #57, #645 and #657

ISSUE TYPE

Bugfix Pull Request

ADDITIONAL INFORMATION

I was not able to reproduce the issue locally so pushing to see if the CI will be able to crash.

This is also introducing memory leaks as the callback structure is never freed. We should probably store it somewhere in the python code before returning to make sure it is not garbage collected (or can the python GC track the callback pointer is still stored on the libssh side?).

packit-as-a-service · 2024-11-14T14:48:19Z

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

sudo yum install -y dnf-plugins-core on RHEL 8
sudo dnf install -y dnf-plugins-core on Fedora
dnf copr enable packit/ansible-pylibssh-658
And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

tests/unit/channel_test.py

docs/changelog-fragments/658.bugfix.rst

Fixes ansible#57 Signed-off-by: Jakub Jelen <[email protected]>

Signed-off-by: Jakub Jelen <[email protected]>

sonarcloud · 2024-11-19T10:14:31Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

webknjaz · 2024-11-19T15:45:31Z

tests/unit/channel_test.py

-    strict=False,
-)
+def exec_second_command(ssh_channel):
+    """Call exec_command() from different context in the call stack."""


Calling from a different context is a part of the motivation. Within this function, there's nothing that changes the context. You're probably referring to the calling test but this is not something that would go into the docstring because it's not what the function does, it's what that test does, probably.

Suggested change

"""Call exec_command() from different context in the call stack."""

"""Return ``exec_command()`` stdout as a text string."""

webknjaz · 2024-11-19T15:54:25Z

src/pylibsshext/channel.pyx

+        cb_size = sizeof(callbacks.ssh_channel_callbacks_struct)
+        cdef callbacks.ssh_channel_callbacks_struct *cb = <callbacks.ssh_channel_callbacks_struct *>PyMem_Malloc(cb_size)
+        if cb is NULL:
+            raise LibsshChannelException("Memory allocation error")


Codecov says this line is uncovered: https://app.codecov.io/gh/ansible/pylibssh/pull/658#17ffa324129dc304109f4a66c79769d7-R173.

Could you add a test covering this newly added line so that all the new lines in the patch are covered?

Is there a simple way to make the malloc fail to add to the test coverage?

webknjaz · 2024-11-19T15:58:00Z

tests/unit/channel_test.py

-    u_cmd_out = ssh_channel.exec_command('echo -n Hello Again').stdout.decode()
-    assert u_cmd_out == u'Hello Again'  # noqa: WPS302
+
+    # randomize the stack a bit more by calling this from yet another function


Is this accurate enough?

Suggested change

# randomize the stack a bit more by calling this from yet another function

# NOTE: Call `exec_command()` once again from another function to

# NOTE: force it to happen in another place of the call stack,

# NOTE: making sure that the context is different from one in this

# NOTE: this test function. The resulting call stack will end up

# NOTE: being more random.

webknjaz · 2024-11-19T15:58:59Z

src/pylibsshext/channel.pyx


        libssh.ssh_channel_send_eof(channel)
        result.returncode = libssh.ssh_channel_get_exit_status(channel)
        if channel is not NULL:
            libssh.ssh_channel_close(channel)
            libssh.ssh_channel_free(channel)

+        # XXX leaking the memory of the callbacks


Could you expand on what this means? Is this a FIXME? Does it have to remain in the patch? Why not use a # FIXME: comment instead?

Given that this is a side-effect that might be considered as a bug, I would prefer to keep a reference to that. Other option might be to keep some list of callbacks on the session layer to free when we free session, but I wanted first PoC to confirm this approach was working. This will be an issue for older libssh versions before we will fix it with the following change in libssh: https://gitlab.com/libssh/libssh-mirror/-/merge_requests/549

Given that I see this works, I can probably implement more cleaner approach without memory leaks, but probably only tomorrow.

webknjaz · 2024-11-19T16:31:25Z

@Jakuje it looks like this is making some CI jobs get stuck: https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210235668?pr=658 / https://github.com/ansible/pylibssh/actions/runs/11910858999/job/33210009382?pr=658.
(I cancelled the 3.9 one after 39 minutes, and the 3.10 one was killed by xdist after a 30-second timeout)

webknjaz · 2024-11-19T16:32:32Z

I restarted said jobs, but this is something to look into, as it'll probably make the CI flakier if merged.

webknjaz · 2024-11-19T16:34:29Z

@Jakuje the rawhide test failure log is similar to those in GHA: https://download.copr.fedorainfracloud.org/results/packit/ansible-pylibssh-658/fedora-rawhide-x86_64/08279621-python-ansible-pylibssh/builder-live.log.gz.

webknjaz · 2024-11-19T17:08:17Z

OTOH, it's also unstable on devel: https://github.com/ansible/pylibssh/actions/runs/11917417576/job/33212843276

Jakuje force-pushed the crash-exec branch from 63b8ab7 to 5edf773 Compare November 14, 2024 14:27

psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Nov 14, 2024

Jakuje force-pushed the crash-exec branch from 8c72faa to 1d93243 Compare November 15, 2024 08:19

webknjaz reviewed Nov 15, 2024

View reviewed changes

tests/unit/channel_test.py Outdated Show resolved Hide resolved

webknjaz reviewed Nov 15, 2024

View reviewed changes

docs/changelog-fragments/658.bugfix.rst Outdated Show resolved Hide resolved

Jakuje force-pushed the crash-exec branch 2 times, most recently from 6d6711a to b94734d Compare November 15, 2024 20:41

Jakuje added 2 commits November 19, 2024 11:11

channels: Use dynamically allocated memory for callbacks

398e3af

Fixes ansible#57 Signed-off-by: Jakub Jelen <[email protected]>

tests: Adjusted reproducer for ansible#57

f0a908a

Signed-off-by: Jakub Jelen <[email protected]>

Jakuje force-pushed the crash-exec branch from 1de5bc0 to 251bb6d Compare November 19, 2024 10:12

Jakuje marked this pull request as ready for review November 19, 2024 10:12

Add changelog

805f0a0

Signed-off-by: Jakub Jelen <[email protected]>

Jakuje force-pushed the crash-exec branch from 3cabd51 to 805f0a0 Compare November 19, 2024 10:14

webknjaz reviewed Nov 19, 2024

View reviewed changes

webknjaz changed the title ~~Prevent crash on consecutive exec_command()~~ Prevent SEGFAULTs on consecutive exec_command() invocations Nov 19, 2024

webknjaz mentioned this pull request Nov 19, 2024

exec_command segfault partial fix #280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Jakuje commented Nov 14, 2024

packit-as-a-service bot commented Nov 14, 2024

sonarcloud bot commented Nov 19, 2024

webknjaz Nov 19, 2024

webknjaz Nov 19, 2024 •

edited

Loading

webknjaz Nov 19, 2024

Jakuje Nov 19, 2024

webknjaz Nov 19, 2024

webknjaz Nov 19, 2024

Jakuje Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

	"""Call exec_command() from different context in the call stack."""
	"""Return ``exec_command()`` stdout as a text string."""

-    # randomize the stack a bit more by calling this from yet another function
+    # NOTE: Call `exec_command()` once again from another function to
+    # NOTE: force it to happen in another place of the call stack,
+    # NOTE: making sure that the context is different from one in this
+    # NOTE: this test function. The resulting call stack will end up
+    # NOTE: being more random.

Prevent SEGFAULTs on consecutive exec_command() invocations #658

Are you sure you want to change the base?

Prevent SEGFAULTs on consecutive exec_command() invocations #658

Conversation

Jakuje commented Nov 14, 2024

SUMMARY

ISSUE TYPE

ADDITIONAL INFORMATION

packit-as-a-service bot commented Nov 14, 2024

sonarcloud bot commented Nov 19, 2024

Quality Gate passed

webknjaz Nov 19, 2024

Choose a reason for hiding this comment

webknjaz Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

webknjaz Nov 19, 2024

Choose a reason for hiding this comment

Jakuje Nov 19, 2024

Choose a reason for hiding this comment

webknjaz Nov 19, 2024

Choose a reason for hiding this comment

webknjaz Nov 19, 2024

Choose a reason for hiding this comment

Jakuje Nov 19, 2024

Choose a reason for hiding this comment

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

webknjaz commented Nov 19, 2024

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

Prevent `SEGFAULT`s on consecutive `exec_command()` invocations #658

webknjaz Nov 19, 2024 •

edited

Loading