Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssl: support IO-like object as the underlying transport #736

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

rhenium
Copy link
Member

@rhenium rhenium commented Mar 27, 2024

An implementation of #731. The test suite passes on my box, but it needs more testing especially around the error handling.

This adds support for IO-like object that is not backed by a file descriptor by defining a BIO_METHOD to wrap the following Ruby methods.

  • #read_nonblock(len, exception: false)
  • #write_nonblock(str, exception: false)
  • #wait_readable
  • #wait_writable
  • #flush
  • #closed?
  • #close

BIO_clear_retry_flags(p->bio);

VALUE fargs[] = { INT2NUM(p->dlen), nonblock_kwargs };
VALUE ret = rb_funcallv_public_kw(io, rb_intern("read_nonblock"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should't this ret be somhow marked so it's not moved? (thinking of compaction here).

probably too early for optimizations, ,but this bit could benefit of a socket-local string to act as a buffer (second argument of :read_nonblock)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content is copied in this function and the String object won't be kept, so it shouldn't be necessary.

probably too early for optimizations, ,but this bit could benefit of a socket-local string to act as a buffer (second argument of :read_nonblock)

Yes, this is definitely a possible optimization.

rb_io_set_nonblock(fptr);
}
else {
// Not meant to be a comprehensive check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per the BIO impl, shouldn't this also verify "#flush"? (hypothetically, understand this wasn't as exhaustive check).

@@ -1735,6 +1805,11 @@ no_exception_p(VALUE opts)
static void
io_wait_writable(VALUE io)
{
if (!is_real_socket(io)) {
if (!RTEST(rb_funcallv(io, rb_intern("wait_writable"), 0, NULL)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be the same as the rb_io_maybe_wait_writable call below?

Copy link
Member Author

@rhenium rhenium Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this should be the same behavior. IO#wait_writable without any argument should honor IO#timeout= value.

@HoneyryderChuck
Copy link
Contributor

@rhenium anything I can help with to move this forward?

@HoneyryderChuck
Copy link
Contributor

@rhenium friendly ping

rhenium added 9 commits March 5, 2025 20:09
Implement a bare minimal BIO_METHOD required for SSL/TLS. The
underlying IO-like object must implement the following methods:

 - #read_nonblock(len, exception: false)
 - #write_nonblock(str, exception: false)
 - #flush

A later commit will wire it into OpenSSL::SSL::SSLSocket.
An exception raised in the SSLContext#servername_cb callback aborts the
handshake and sends an "unrecognized_name" alert to the client. Add more
direct assertions for this scenario.
This is no longer necessary as of commit 22e601a (Remove usage of
IO internals, 2023-05-29).
rb_eSystemCallError is defined in Ruby's public header files, so let's
just use it.

Also, clean up the arguments to the rb_rescue2() call.
The result value is used for generating an informative error message.
Let's just say "unsupported" if it's not available.
The value is used to determine whether SSLSocket should skip buffering
in OpenSSL::Buffering or not. Defaulting to true (no buffering) should
be a safe option.
There are use cases to establish a TLS connection on top of a non-OS
stream, such as another TLS connection or an HTTP/2 tunnel. To achieve
this today, a workaround using dummy socket pairs is necessary.

Currently, OpenSSL::SSL::SSLSocket.new requires an IO (socket) object
backed by a file descriptor. This is because we pass the file descriptor
to OpenSSL. This patch changes it to allow any Ruby object that responds
to necessary non-blocking IO methods, such as read_nonblock.

OpenSSL's TLS implementation uses an IO abstraction layer called BIO to
interact with the underlying socket. By passing the file descriptor to
SSL_set_fd(), a BIO with the BIO_s_socket() BIO_METHOD is implicitly
created. We can set up our own BIO and let OpenSSL use it instead. The
previous patch added such a BIO_METHOD implementation.

For performance reason, this patch continues to use the socket BIO if
the user passes a real IO object, so this should not change the behavior
of existing programs in any way.
Let's see what will break with this.
@rhenium
Copy link
Member Author

rhenium commented Mar 5, 2025

I just pushed what I have in my local branch so far, rebased on top of current master to resolve merge conflicts. This is not ready to merge yet.

Some unresolved issues:

  • We have many configurable callbacks that may be triggered by various methods on SSLSocket. If a callback attempts a tag jump (e.g., raising an exception), it is caught by rb_protect() and the jump tag is stored in the ID_callback_state ivar. After returning from the OpenSSL function, the ivar is checked and the jump is resumed. Currently, if a callback fails, no other callbacks will be executed from the same OpenSSL function call (unless I'm mistaken), so this is working as expected.

    A jump from some callbacks is considered critical to the connection and a TLS alert must be sent to the peer. For example, an exception in SSLContext#servername_cb should cause an unrecognized_name alert to be sent to the client.

    Since writing to the underlying socket requires running Ruby code, this means we need to execute Ruby code while a jump is still on hold. This PR currently doesn't try to handle this, and it's possible to cause a segfault (as you can see in the two failing test cases I added in a [DO NOT MERGE] commit). How should this be fixed?

    I initially tried saving rb_errinfo() temporarily and restoring later with rb_set_errinfo(), but this can't work for all cases because rb_set_errinfo() only accepts exceptions.

  • If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?

  • OpenSSL suppresses a certain kind of errors when writing TLS 1.3 NewSessionTicket to the underlying socket, when it is harmless (https://github.com/openssl/openssl/blob/e599893a9fec932701ca824d73a794a0c9ce02e9/ssl/statem/statem_srvr.c#L852-L870). While we could rescue exceptions like Errno::ECONNRESET raised by underlying_socket.write_nonblock to replicate the behavior, I'm not sure if that would be a good idea. This PR currently doesn't handle this. This is a slight difference compared to when using the socket BIO.

@HoneyryderChuck
Copy link
Contributor

This PR currently doesn't try to handle this, and it's possible to cause a segfault

Indeed, I didn't know that doing that was dangerous, though that rb_protect could deal with it.

One option could be to assume that the IO-like object must indeed be an SSLSocket, and one could deal with it by setting an ivar "flag" telling the internal implementation to skip 2nd-level rb_protect/rb_jump_tag? Other than that, I'm out of ideas.

If both a callback and the underlying socket does a jump in the same single OpenSSL function call

Can they happen at the same time? Perhaps suggestion above deals with it?

An alternative approach could be to go back to the drawing board and try again the approach using BIO_mem. I think this approach could be more pragmatic, at the cost of having to do a lot of "callback to rubyland", just not sure at this point if the entension API allows that.

@rhenium
Copy link
Member Author

rhenium commented Mar 10, 2025

Can they happen at the same time?

Please see test_synthetic_io_errors_in_callback_and_socket for an example. SSLContext#servername_cb is triggered when the server receives a ClientHello. If the callback doesn't succeed, the server then has to send an unrecognized_name alert. Both happen inside the same SSL_accept() function call.

I think the segfault is fixable by (ab)using rb_ensure(), but I wasn't sure about in which way I should make it work:

If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?

@rhenium rhenium marked this pull request as draft March 12, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants