Description
Currently, the biggest "missing feature" in stdlib ruby URI/DNS resolution supply chain, is IDNA support. addressable, the OS alternative to stdlib uri, has some support for it, which is, I believe, the main reason why it is a transitive dependency from many other gems (It's other feature, uri templates, is just not as compelling).
This is a proposal for a way to solve this.
punycode
IDNA domains are translated to its punycode representation, in order to be used in DNS queries (which require ascii domains). ruby core stdlib does not have a punycode
converter, so this is where it should start IMO. For that, I propose: either a new punycode
stdlib gem (bundled?), or its functionality to be available as a submodule of URI
in the uri
stdlib:
# as a bundled gem
require "punycode"
Punycode.encode("l♥️h.ws") #=> "xn--lh-t0xz926h.ws"
Punycode.decode("xn--lh-t0xz926h.ws") #=> "l♥️h.ws"
# as internal functionality
require "uri/punycode"
URI::Punycode.encode(...
implementation
addressable
, as well as other (mostly abandoned) gems, support the IDNA 2003 standard. You'll find both libidn
based extensions, as well as pure ruby ports. This has been since superseded by the IDNA 2008 standard (which essentially supports all the more recent unicode versions, plus some edge cases). While I think that a pure ruby implementation should be entertained at some point, I think that at this point, ruby
should do best by adopting the most standardized implementation around, and that's libidn2: it's used by most other network libraries, including curl
, and distributed as a package for most (all?) OSes supported by ruby.
Integration of libidn2
can be done by either a C extension, or FFI (I'm the maintainer of idnx, which already FFI's into libidn2 and winnls for windows). The advantage of the latter is that it works OOTB for java. The disadvantage may be performance (?), for which a C extension may be a better fit, but then we'd need to know whether java stdlib contains an equivalent of IDNA conversion supporting IDNA 2008.
This means that libidn2
would become a dependency when building ruby
. It could be dealt with, however, as an optional dependency, like openssl
is: when available, URI::Punycode
is defined, and when it isn't, URI::Punycode
is not. most ruby installers could then opportunistically install the package as well, just like it's done already with openssl
.
(addressable
is aware of its lack of IDNA 2008 support, and is working on it by FFI'ing into libidn2 as well).
API
uri
could then transparently handle translation internally. I propose that, beyond the proposal made above, nothing else in the public API changes. Instead URI::Generic
would support translation OOTB on building objects:
uri = "https://l♥️h.ws"
uri = URI(uri)
uri.host #=> "l♥️h.ws"
uri.hostname #=> "xn--lh-t0xz926h.ws"
# the example above is inspired in how uri already handles IPv6 addresses
uri = URI("https://[::1]")
uri.host #=> "[::1]", cannot be used in Socket.new(host, port)
uri.hostname #=> "::1", can be used in Socket.new(host, port)
This could then be used internally in the resolv
library, before issuing the DNS query.