Skip to content

AVX-512 AES-XTS improvements #1550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

AVX-512 AES-XTS improvements #1550

wants to merge 7 commits into from

Conversation

pittma
Copy link
Contributor

@pittma pittma commented Apr 25, 2024

Changes found in this PR:

  1. Do not copy round keys to the stack, use them in place.
  2. Add an [encrypt|decrypt]_by_four implementation to replace both encrypt_initial and initialize.
  3. Fix Windows calling convention and reenable AVX-512 AES-XTS for Windows.
  4. Minor clean up and documentation.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@pittma pittma requested a review from a team as a code owner April 25, 2024 18:16
@andrewhop andrewhop requested a review from nebeid April 30, 2024 17:22
@pittma
Copy link
Contributor Author

pittma commented Apr 30, 2024

Looks like some of those failures are on old versions of clang/gcc. I'll put the bytecode gen stuff back in :/

@pittma
Copy link
Contributor Author

pittma commented May 7, 2024

Looks like some of those failures are on old versions of clang/gcc. I'll put the bytecode gen stuff back in :/

While this may be the case, the failure at hand was a FIPS/delocate parsing error. I'll wait and see how far this FIPS fix gets us.

@nebeid
Copy link
Contributor

nebeid commented Oct 9, 2024

@pittma, would you mind please rebasing this work so we can see if the errors persist. Sorry for the delay in following up.

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.76%. Comparing base (89889a9) to head (71451b8).
Report is 40 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1550      +/-   ##
==========================================
- Coverage   79.04%   78.76%   -0.28%     
==========================================
  Files         612      620       +8     
  Lines      106631   107864    +1233     
  Branches    15098    15319     +221     
==========================================
+ Hits        84285    84962     +677     
- Misses      21692    22248     +556     
  Partials      654      654              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -3126,125 +3038,6 @@
___
}

# Bits 7 & 4 contain the src1 register's MSB in inverted form
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this perl handling removed? I believe that the replacement of the .byte instructions with vaesenc|dec|enclast|declast| and vpclmulqdq in the Perl output is causing the failures in gcc 7 and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hoping this was vestigial! Before I removed the copy code I did a bunch of clean up, for this bit I was really just crossing my fingers that it wasn't needed. I'll add it back in shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pittma , you're still planning on adding it in, right?

@pittma
Copy link
Contributor Author

pittma commented Nov 4, 2024

Happy Monday @nebeid! I've got some Monday vibes for you this morning!

@pittma , you're still planning on adding it in, right?

Yes, BUT—

We may need to hold off on this patch for now, but I don't want to close it until I know definitively. I suspect we know now why this implementation (which was ported to perlasm from Intel's ISA-L crypto library before I joined this team) did this copy to the stack. I've been working intermittently to port this patch to OpenSSL as well, and OpenSSL tests 64-byte block sizes which were not included in the speed tests we did with AWS-LC. With the heap-based implementation, we see a throughput regression with those block sizes but none of the others.

What's different:

The vmov's, when we're working with the key rounds on the stack, are aligned while when we DO NOT copy, they're unaligned (vmovqa vs vmovqu), this change is causing a pretty significant difference in cache misses, at least that's what we think is happening at the moment. The original implementation may have been trading space for time.

Next Steps:

I'm still collecting data on this, and it may all be specious anyway because some XTS tests are failing on the OpenSSL patch currently, but as soon as I know more I'll update this PR with the specifics and we can decide together what we want to do.

@pittma
Copy link
Contributor Author

pittma commented Feb 27, 2025

Hi all, it's been a while!

The refactored version of this implementation was merged into OpenSSL this morning. There are a number of fixes that I'd like to bring over—including what I mentioned in this comment[1] as well as enabling AVX-512 support for XTS on Windows if that's something y'all are interested in. I'll be working on bringing those changes over in the next week or two.

  1. This was a red herring. The issue ended up being related to the serial nature of encrypting small blocks, the new implementation adds a smaller parallelization (4 rather than 8) that can be used in small-block cases.

@pittma
Copy link
Contributor Author

pittma commented Mar 14, 2025

I don't think those two failures are related to these changes…

@pittma pittma changed the title Use on-heap keys in AVX-512 XTS implementation rather than copy to stack AVX-512 AES-XTS improvements Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants