Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support digest plugins #3822

Merged
merged 2 commits into from
Apr 4, 2025
Merged

Conversation

nirvdrum
Copy link
Collaborator

@nirvdrum nirvdrum commented Mar 31, 2025

Overview

The digest gem is a default gem that has a standard C API for contributing new digest algorithms. The gem includes several built-in digest algorithms and they all make use of this plugin mechanism. However, TruffleRuby does not run the real digest source when a caller uses require digest. Rather, TruffleRuby reimplements the digest API in Ruby and Java, allowing it to make use of the JVM's MessageDigest instances.

Generally, this reimplementation is seamless. The Digest API has not changed in many years and TruffleRuby ships with support for all of the included digest algorithms. While running the digest native extension would work on TruffleRuby, it would be slower than using the optimized JVM code.

This, however, has presented a problem when it comes to digest plugins: since we don't run the native code, we can't load native plugins. We've had an issue open for 6.5 years asking for this support. This PR adds support for loading digest plugins using FFI to work with TruffleRuby's reimplementation.

When an unknown digest is loaded, we now include the Truffle::Digest::Plugin module, which will override functionality from Digest::Base. The current built-in digest algorithms (e.g., MD5 and SHA256) will continue to work as they have been in TruffleRuby. But, new algorithms that adhere to the digest gem's API will now load and function.

BLAKE3

As a motivating example, this PR allows Shopify's BLAKE3 gem, which is written in Rust, to load and execute on TruffleRuby. The blake3-rb Ruby test suite passes 100%. The gem ships with another test suite for Cargo that does not pass due to ongoing work with rb-sys. These tests load the compiled extension into a Rust test runner and do not impact the typical execution mode of running blake3-rb from a gem.

I looked for other digest plugins to work against, but was unable to find any that work with the latest digest gem even on MRI. If anyone has a gem in particular that they would like to me to check, I'm happy to do so, provided it works with MRI.

Testing

This PR also adds some C extension tests for the plugin support. Despite not strictly being part of the MRI extension API, I placed them with other extension specs because digest is a default gem and it's expected that API works out of the box. I struggled a bit with finding the right balance in testing the API boundary without recreating all of the Digest Ruby specs we already have. I settled on two groups of tests:

  • one that verifies simple data fields the metadata object
  • one that verifies the callback functions in the metadata object by exposing and inspecting the context object, which isn't exposed as part of the Digest Ruby API

For the callback tests I wrote a simple digest plugin that writes to the context in straightforward ways. The plugin doesn't actually perform any hashing of values, but rather writes text strings that can be easily verified in the specs. I'm open to suggestions if this form of testing is too indirect.

Benchmarks

The blake3-rb gem ships with some benchmarks. I ran the string benchmarks on my workstation. I didn't fully tune for benchmarks so please take the numbers with a grain of salt. I'm running Ubuntu 24.04.2 (kernel: 6.8.0-55-generic) on a Ryzen 9800X3D processor with the performance governor.

truffleruby 25.0.0-dev-b0b60d25*, like ruby 3.3.7, GraalVM CE JVM [x86_64-linux]
Warming up --------------------------------------
        Digest::SHA1   247.000 i/100ms
      Digest::SHA256   234.000 i/100ms
         Digest::MD5   105.000 i/100ms
      Blake3::Digest   927.000 i/100ms
Calculating -------------------------------------
        Digest::SHA1      2.202k (± 1.5%) i/s  (454.09 μs/i) -     11.115k in   5.048372s
      Digest::SHA256      2.092k (± 0.9%) i/s  (477.98 μs/i) -     10.530k in   5.033589s
         Digest::MD5      1.005k (± 0.7%) i/s  (994.79 μs/i) -      5.040k in   5.014001s
      Blake3::Digest     10.069k (± 8.6%) i/s   (99.31 μs/i) -     50.058k in   5.043939s

Comparison:
      Blake3::Digest:    10069.3 i/s
        Digest::SHA1:     2202.2 i/s - 4.57x  slower
      Digest::SHA256:     2092.1 i/s - 4.81x  slower
         Digest::MD5:     1005.2 i/s - 10.02x  slower

Running through FFI doesn't appear to introduce too large of an overhead. I've verified that the blake3-rb extension is using SIMD instructions by way of the BLAKE3 Rust library. Absent built-in support for BLAKE3 in the JVM, I'm skeptical that a pure Java implementation would do much better.

The string benchmark, however, processes strings of various lengths in each benchmark. I had seen fairly large variance in performance depending on the length of a string. That's worth exploring in more depth.

For comparison, here's what I get with MRI 3.4.2:

ruby 3.4.2 (2025-02-15 revision d2930f8e7a) +PRISM [x86_64-linux]
Warming up --------------------------------------
        Digest::SHA1   131.000 i/100ms
      Digest::SHA256    43.000 i/100ms
         Digest::MD5    91.000 i/100ms
      Blake3::Digest     1.096k i/100ms
Calculating -------------------------------------
        Digest::SHA1      1.311k (± 0.3%) i/s  (762.60 μs/i) -      6.681k in   5.094972s
      Digest::SHA256    436.367 (± 0.2%) i/s    (2.29 ms/i) -      2.193k in   5.025595s
         Digest::MD5    910.901 (± 0.1%) i/s    (1.10 ms/i) -      4.641k in   5.094968s
      Blake3::Digest     11.022k (± 0.6%) i/s   (90.72 μs/i) -     55.896k in   5.071335s

Comparison:
      Blake3::Digest:    11022.3 i/s
        Digest::SHA1:     1311.3 i/s - 8.41x  slower
         Digest::MD5:      910.9 i/s - 12.10x  slower
      Digest::SHA256:      436.4 i/s - 25.26x  slower

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 31, 2025
@nirvdrum nirvdrum added compatibility shopify and removed OCA Verified All contributors have signed the Oracle Contributor Agreement. labels Mar 31, 2025
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 31, 2025
@nirvdrum nirvdrum force-pushed the support-digest-plugins branch from 11d4abe to d01a777 Compare April 1, 2025 05:11
@andrykonchin andrykonchin added the in-ci The PR is being tested in CI. Do not push new commits. label Apr 1, 2025
@nirvdrum nirvdrum force-pushed the support-digest-plugins branch 2 times, most recently from b067685 to d002586 Compare April 2, 2025 05:27
@nirvdrum
Copy link
Collaborator Author

nirvdrum commented Apr 2, 2025

I replaced the the usage of FFI to read a pointer with Fiddle, which is a default gem. I hope that's sufficient. If it isn't then I can change these specs, but most of the verification that the plugin work involves ensuring the digest metadata and context are working correctly. The metadata struct is needed to for the plugin to tie into the digest machinery. The context object I'm reading as a way to to indirectly verify that the init, update, and finish functions passed as pointers in the metadata have been invoked correctly.

I've addressed a couple of lint issues as well. One was due to copying code from digest (and that I attributed) not matching the code style asked for in the Ruby Spec Suite. I think the remaining lint failures are unrelated to this PR.

@nirvdrum nirvdrum force-pushed the support-digest-plugins branch from d002586 to ac5296c Compare April 2, 2025 06:59
@@ -0,0 +1 @@
inherited
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a difference from MRI, but I don't think it would be realistically be a problem given it's a standard hook in Ruby. The inherited hook is used to mix in Truffle::Digest::Plugin to new Digest::Base subclasses. This allows for seamless loading of plugins, but does introduce two user visible differences from MRI:

  1. The presence of inherited in Digest::Base.singleton_class.public_instance_methods(false)
  2. The mixed in Truffle::Digest::Plugin module for digest gem plugins

I think both of these deviations are quite small and unlikely to cause any real world trouble. If they are problematic, we'll need to find another way to load digest plugins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree

@nirvdrum nirvdrum force-pushed the support-digest-plugins branch from ac5296c to a2e7c58 Compare April 2, 2025 15:20
@nirvdrum
Copy link
Collaborator Author

nirvdrum commented Apr 3, 2025

At this point, I think all of the test failures are problems with the build unrelated to my PR. If that's not correct, please let me know and I'll work on it.

@andrykonchin andrykonchin self-assigned this Apr 4, 2025
graalvmbot pushed a commit that referenced this pull request Apr 4, 2025
PullRequest: truffleruby/4507
@graalvmbot graalvmbot merged commit a2e7c58 into oracle:master Apr 4, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compatibility in-ci The PR is being tested in CI. Do not push new commits. OCA Verified All contributors have signed the Oracle Contributor Agreement. shopify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants