ssl: support IO-like object as the underlying transport #736

rhenium · 2024-03-27T15:38:11Z

An implementation of #731. The test suite passes on my box, but it needs more testing especially around the error handling.

This adds support for IO-like object that is not backed by a file descriptor by defining a BIO_METHOD to wrap the following Ruby methods.

#read_nonblock(len, exception: false)
#write_nonblock(str, exception: false)
#wait_readable
#wait_writable
#flush
#closed?
#close

ext/openssl/ossl_bio.c

HoneyryderChuck · 2024-05-17T14:57:47Z

ext/openssl/ossl_bio.c

+    BIO_clear_retry_flags(p->bio);
+
+    VALUE fargs[] = { INT2NUM(p->dlen), nonblock_kwargs };
+    VALUE ret = rb_funcallv_public_kw(io, rb_intern("read_nonblock"),


should't this ret be somhow marked so it's not moved? (thinking of compaction here).

probably too early for optimizations, ,but this bit could benefit of a socket-local string to act as a buffer (second argument of :read_nonblock)

The content is copied in this function and the String object won't be kept, so it shouldn't be necessary.

probably too early for optimizations, ,but this bit could benefit of a socket-local string to act as a buffer (second argument of :read_nonblock)

Yes, this is definitely a possible optimization.

ext/openssl/ossl_bio.c

HoneyryderChuck · 2024-05-17T15:13:17Z

ext/openssl/ossl_ssl.c

+        rb_io_set_nonblock(fptr);
+    }
+    else {
+        // Not meant to be a comprehensive check


as per the BIO impl, shouldn't this also verify "#flush"? (hypothetically, understand this wasn't as exhaustive check).

HoneyryderChuck · 2024-05-17T15:17:48Z

ext/openssl/ossl_ssl.c

@@ -1735,6 +1805,11 @@ no_exception_p(VALUE opts)
 static void
 io_wait_writable(VALUE io)
 {
+    if (!is_real_socket(io)) {
+        if (!RTEST(rb_funcallv(io, rb_intern("wait_writable"), 0, NULL)))


shouldn't this be the same as the rb_io_maybe_wait_writable call below?

If I understand correctly, this should be the same behavior. IO#wait_writable without any argument should honor IO#timeout= value.

HoneyryderChuck · 2024-05-17T15:20:17Z

@rhenium anything I can help with to move this forward?

HoneyryderChuck · 2024-08-20T07:33:56Z

@rhenium friendly ping

rhenium · 2025-03-05T16:11:37Z

I just pushed what I have in my local branch so far, rebased on top of current master to resolve merge conflicts. This is not ready to merge yet.

Some unresolved issues:

We have many configurable callbacks that may be triggered by various methods on SSLSocket. If a callback attempts a tag jump (e.g., raising an exception), it is caught by rb_protect() and the jump tag is stored in the ID_callback_state ivar. After returning from the OpenSSL function, the ivar is checked and the jump is resumed. Currently, if a callback fails, no other callbacks will be executed from the same OpenSSL function call (unless I'm mistaken), so this is working as expected.

A jump from some callbacks is considered critical to the connection and a TLS alert must be sent to the peer. For example, an exception in SSLContext#servername_cb should cause an unrecognized_name alert to be sent to the client.

Since writing to the underlying socket requires running Ruby code, this means we need to execute Ruby code while a jump is still on hold. This PR currently doesn't try to handle this, and it's possible to cause a segfault (as you can see in the two failing test cases I added in a [DO NOT MERGE] commit). How should this be fixed?

I initially tried saving rb_errinfo() temporarily and restoring later with rb_set_errinfo(), but this can't work for all cases because rb_set_errinfo() only accepts exceptions.
If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?
OpenSSL suppresses a certain kind of errors when writing TLS 1.3 NewSessionTicket to the underlying socket, when it is harmless (https://github.com/openssl/openssl/blob/e599893a9fec932701ca824d73a794a0c9ce02e9/ssl/statem/statem_srvr.c#L852-L870). While we could rescue exceptions like Errno::ECONNRESET raised by underlying_socket.write_nonblock to replicate the behavior, I'm not sure if that would be a good idea. This PR currently doesn't handle this. This is a slight difference compared to when using the socket BIO.

HoneyryderChuck · 2025-03-06T23:04:53Z

This PR currently doesn't try to handle this, and it's possible to cause a segfault

Indeed, I didn't know that doing that was dangerous, though that rb_protect could deal with it.

One option could be to assume that the IO-like object must indeed be an SSLSocket, and one could deal with it by setting an ivar "flag" telling the internal implementation to skip 2nd-level rb_protect/rb_jump_tag? Other than that, I'm out of ideas.

If both a callback and the underlying socket does a jump in the same single OpenSSL function call

Can they happen at the same time? Perhaps suggestion above deals with it?

An alternative approach could be to go back to the drawing board and try again the approach using BIO_mem. I think this approach could be more pragmatic, at the cost of having to do a lot of "callback to rubyland", just not sure at this point if the entension API allows that.

rhenium · 2025-03-10T13:27:29Z

Can they happen at the same time?

Please see test_synthetic_io_errors_in_callback_and_socket for an example. SSLContext#servername_cb is triggered when the server receives a ClientHello. If the callback doesn't succeed, the server then has to send an unrecognized_name alert. Both happen inside the same SSL_accept() function call.

I think the segfault is fixable by (ab)using rb_ensure(), but I wasn't sure about in which way I should make it work:

If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?

HoneyryderChuck · 2025-03-22T12:36:46Z

Sorry for not replying sooner, have been stumped with other chores. I think "last exception wins" is reasonable, and considering this is a new feature, it's not really breaking compatibility.

Lmk when you work around the jump tag issue, I can run a few sanity checks on my side as well.

rhenium · 2025-03-30T15:29:33Z

This branch needs more polishing, but it should be working aside from some edge cases. I'd appreciate if you could test it.

I should have worded more accurately, but the last commit changed it so that the later tag jump (which includes raising an exception) wins against the former one. This should fix the segfault.

One concern is that this can accidentally suppress an important jump like the one created by Thread#kill. Please see this Ruby issue for details: https://bugs.ruby-lang.org/issues/13882

I haven't come up with a nice solution for this.

HoneyryderChuck · 2025-04-08T13:12:53Z

just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).

rhenium · 2025-04-08T16:02:12Z

just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).

That Gemfile snippet works as expected for me (ruby-build Ruby 3.4 on Linux). I'm not sure why it's failing for you. What happens if you clone the repository, run gem build to make a .gem archive, and try installing it?

HoneyryderChuck · 2025-04-16T13:53:43Z

I did some effort to build openssl from branch in httpx CI, and now I'm getting this error a bit all over the place:

FaradayTest#test_adapter_get_handles_compression:
RangeError: integer 139928495023720 too big to convert to 'int'
    /ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::SSL::SSLSocket#syswrite_nonblock'
    /ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::Buffering#write_nonblock'
    lib/httpx/io/tcp.rb:129:in 'HTTPX::TCP#write'

I may only be able to do some investigation in one week.

rhenium · 2025-04-16T15:00:12Z

RangeError: integer 139928495023720 too big to convert to 'int'

I was able to reproduce it locally. This is a bug in #syswrite{,_nonblock} with a String-convertible object that responds to #to_str (such as HTTPX::Buffer). #881 should fix this.

Implement a bare minimal BIO_METHOD required for SSL/TLS. The underlying IO-like object must implement the following methods: - #read_nonblock(len, exception: false) - #write_nonblock(str, exception: false) - #flush A later commit will wire it into OpenSSL::SSL::SSLSocket.

An exception raised in the SSLContext#servername_cb callback aborts the handshake and sends an "unrecognized_name" alert to the client. Add more direct assertions for this scenario.

This is no longer necessary as of commit 22e601a (Remove usage of IO internals, 2023-05-29).

rb_eSystemCallError is defined in Ruby's public header files, so let's just use it. Also, clean up the arguments to the rb_rescue2() call.

The result value is used for generating an informative error message. Let's just say "unsupported" if it's not available.

The value is used to determine whether SSLSocket should skip buffering in OpenSSL::Buffering or not. Defaulting to true (no buffering) should be a safe option.

There are use cases to establish a TLS connection on top of a non-OS stream, such as another TLS connection or an HTTP/2 tunnel. To achieve this today, a workaround using dummy socket pairs is necessary. Currently, OpenSSL::SSL::SSLSocket.new requires an IO (socket) object backed by a file descriptor. This is because we pass the file descriptor to OpenSSL. This patch changes it to allow any Ruby object that responds to necessary non-blocking IO methods, such as read_nonblock. OpenSSL's TLS implementation uses an IO abstraction layer called BIO to interact with the underlying socket. By passing the file descriptor to SSL_set_fd(), a BIO with the BIO_s_socket() BIO_METHOD is implicitly created. We can set up our own BIO and let OpenSSL use it instead. The previous patch added such a BIO_METHOD implementation. For performance reason, this patch continues to use the socket BIO if the user passes a real IO object, so this should not change the behavior of existing programs in any way.

Let's see what will break with this.

HoneyryderChuck · 2025-04-16T22:58:49Z

thx, tests are passing now 👍

liath · 2025-05-14T00:57:20Z

Just spent the day scratching my head over Net::HTTP's behavior around proxies and didn't see the tickets that led to this PR until now lol
Anything I can do to help here?

rhenium mentioned this pull request Mar 27, 2024

remove file check to support proxied SSL connection #731

Open

HoneyryderChuck reviewed May 17, 2024

View reviewed changes

rhenium force-pushed the ky/ssl-ruby-io branch from ded7cb0 to 84ead32 Compare September 5, 2024 13:25

rhenium force-pushed the ky/ssl-ruby-io branch from 84ead32 to 0eafd72 Compare March 5, 2025 15:56

rhenium marked this pull request as draft March 12, 2025 16:33

rhenium added 10 commits April 17, 2025 00:01

ssl: add a more direct test case for errors in servername_cb

b068a89

An exception raised in the SSLContext#servername_cb callback aborts the handshake and sends an "unrecognized_name" alert to the client. Add more direct assertions for this scenario.

ssl: remove unnecessary GetOpenFile() check in SSLSocket#syswrite*

87a9e19

This is no longer necessary as of commit 22e601a (Remove usage of IO internals, 2023-05-29).

ssl: remove unnecessary constant lookup

52fb7d2

rb_eSystemCallError is defined in Ruby's public header files, so let's just use it. Also, clean up the arguments to the rb_rescue2() call.

ssl: allow underlying socket to not implement #remote_address

890f8bf

The result value is used for generating an informative error message. Let's just say "unsupported" if it's not available.

ssl: allow underlying socket to not implement #sync

845e65a

The value is used to determine whether SSLSocket should skip buffering in OpenSSL::Buffering or not. Defaulting to true (no buffering) should be a safe option.

[DO NOT MERGE] Always use Ruby IO

de20e7d

Let's see what will break with this.

[DO NOT MERGE] bugs

7558b04

The later exception wins

d8004f4

rhenium force-pushed the ky/ssl-ruby-io branch from d44bee5 to d8004f4 Compare April 16, 2025 15:17

liath mentioned this pull request May 14, 2025

Is the HTTPS proxy support known-working under real-world conditions? ruby/net-http#212

Open

ssl: support IO-like object as the underlying transport #736

Are you sure you want to change the base?

ssl: support IO-like object as the underlying transport #736

Conversation

rhenium commented Mar 27, 2024

Uh oh!

Uh oh!

Uh oh!

HoneyryderChuck May 17, 2024

Choose a reason for hiding this comment

Uh oh!

rhenium Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HoneyryderChuck May 17, 2024

Choose a reason for hiding this comment

Uh oh!

HoneyryderChuck May 17, 2024

Choose a reason for hiding this comment

Uh oh!

rhenium Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HoneyryderChuck commented May 17, 2024

Uh oh!

HoneyryderChuck commented Aug 20, 2024

Uh oh!

rhenium commented Mar 5, 2025

Uh oh!

HoneyryderChuck commented Mar 6, 2025

Uh oh!

rhenium commented Mar 10, 2025

Uh oh!

HoneyryderChuck commented Mar 22, 2025

Uh oh!

rhenium commented Mar 30, 2025

Uh oh!

HoneyryderChuck commented Apr 8, 2025

Uh oh!

rhenium commented Apr 8, 2025

Uh oh!

HoneyryderChuck commented Apr 16, 2025

Uh oh!

rhenium commented Apr 16, 2025

Uh oh!

HoneyryderChuck commented Apr 16, 2025

Uh oh!

liath commented May 14, 2025

Uh oh!

Uh oh!

rhenium Mar 5, 2025 •

edited

Loading