Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HTTP validation for inlining and specialized types #5392

Open
wants to merge 1 commit into
base: 4.x
Choose a base branch
from

Conversation

franz1981
Copy link
Contributor

@franz1981 franz1981 commented Nov 16, 2024

This change is mostly to improve native-image performance on validation due to closed-world assumption which lead to virtual calls while reading the chars to validate and no loop unrolling during validation.

Additionally, it takes care of moving out of the hot path unlikely validation scenarios for http header values e.g. presence of CRLF or non-printable control chars: this should help JVM mode as well.
Another thing which can help, despite the duplicated code, is the specialized latin head name validation because doesn't require to turn byte into chars for AsciiString.
Without this change the validation of ASCII heard names was slower than normal strings!
FYI @galderz

@franz1981 franz1981 force-pushed the 4.x_http_type branch 2 times, most recently from e9327f2 to 9101b8f Compare November 16, 2024 10:41
@franz1981
Copy link
Contributor Author

I've had some fun with assembly and reading the RFC and I'm going to push a major improvement on this, turning this in draft first

@franz1981 franz1981 marked this pull request as draft November 17, 2024 03:18
@franz1981
Copy link
Contributor Author

franz1981 commented Nov 17, 2024

TLDR

  1. I've streamlined the header name common path (according to the RFC, which is way stricter!)
  2. I've retained the original behaviour, with some improvement without duplicating any logic as it is fine to be slower there

I think that I could have further exploded the original state machine and let the hotspot JVM to do it right (if able!), but sadly native image isn't that smart and I prefer to keep it like this as it looks more like what Netty do, but with a less strict fallback.

Performance on Hotspot look great, compare to what it was (I will post something home back to my desk), while I can wait @galderz to valide this (I have added a benchmark on purpose, which could be reused) on native image - 🤞

@franz1981 franz1981 marked this pull request as ready for review November 17, 2024 22:57
@franz1981
Copy link
Contributor Author

franz1981 commented Nov 18, 2024

I have pushed e4f4342 and verified it holds from jdk 17 to main.

It solves a problem of "untrusted" types like AsciiString: since the JVM cannot trust offset and end vs arrays boundaries, it would perform additional computations to perform bound checks.
Giving the JIT an hints that offset is 0 (JIT can profile branches) allow propagating that information, reducing some of these computations in the compiled unrolled version of the loop.
I didn't verify if the same problem happens on end too, since I couldn't spot any weird bookkeeping after this change, although I am sure there is much more to shave.
I would like to produce a code which is less magic 🪄 and still do the right thing.

This explain both why Strings are better performers and why Netty value validation was faster: due to a bug of the Netty version (I will send a pr to fix it, since it looks like a potential security issue) the loop variables trigger a better JVM optimization.

@franz1981
Copy link
Contributor Author

Any chance you looked into the code @vietj ?

@vietj
Copy link
Member

vietj commented Nov 22, 2024

early next week @franz1981

@vietj vietj added this to the 4.5.12 milestone Nov 22, 2024
@vietj
Copy link
Member

vietj commented Nov 22, 2024

scheduled for 4.5.12 @franz1981

@galderz
Copy link

galderz commented Dec 18, 2024

These changes have a positive impact on native.

Before PR:

Benchmark                                      (ascii)   Mode  Cnt      Score     Error   Units
HeadersValidationBenchmark.validateNameNetty      true  thrpt    5  13564.945 ± 230.330  ops/ms
HeadersValidationBenchmark.validateNameNetty     false  thrpt    5   4406.927 ±   5.628  ops/ms
HeadersValidationBenchmark.validateNameVertx      true  thrpt    5  17456.963 ±  35.617  ops/ms
HeadersValidationBenchmark.validateNameVertx     false  thrpt    5   3678.795 ±  41.234  ops/ms
HeadersValidationBenchmark.validateValueNetty     true  thrpt    5  16350.225 ± 148.984  ops/ms
HeadersValidationBenchmark.validateValueNetty    false  thrpt    5   3750.169 ±  19.454  ops/ms
HeadersValidationBenchmark.validateValueVertx     true  thrpt    5   1889.591 ±   4.780  ops/ms
HeadersValidationBenchmark.validateValueVertx    false  thrpt    5   2189.690 ± 296.273  ops/ms

After PR:

Benchmark                                      (ascii)   Mode  Cnt      Score     Error   Units
HeadersValidationBenchmark.validateNameNetty      true  thrpt    5  13696.625 ± 111.232  ops/ms
HeadersValidationBenchmark.validateNameNetty     false  thrpt    5   4234.122 ±  49.438  ops/ms
HeadersValidationBenchmark.validateNameVertx      true  thrpt    5  17824.119 ±  52.503  ops/ms
HeadersValidationBenchmark.validateNameVertx     false  thrpt    5   4028.792 ±  23.738  ops/ms
HeadersValidationBenchmark.validateValueNetty     true  thrpt    5  14507.255 ± 455.145  ops/ms
HeadersValidationBenchmark.validateValueNetty    false  thrpt    5   3515.713 ±  30.857  ops/ms
HeadersValidationBenchmark.validateValueVertx     true  thrpt    5  18613.015 ± 227.081  ops/ms
HeadersValidationBenchmark.validateValueVertx    false  thrpt    5   3505.120 ±  10.858  ops/ms

The big winner is HeadersValidationBenchmark.validateValueVertx when ascii=true. The reason the performance improves so much is because it produces assembly with no call instructions for this particular benchmark. Some smaller improvements seen in other cases. No regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants