Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAES support #372

Open
nazar-pc opened this issue Jul 26, 2023 · 8 comments
Open

VAES support #372

nazar-pc opened this issue Jul 26, 2023 · 8 comments

Comments

@nazar-pc
Copy link

Vectorized AES can process more than one block at a time, greatly improving throughput, but it doesn't appear to be used by aes crate yet, which is unfortunate.

@tarcieri
Copy link
Member

Note that AES-NI can already process more than one block-at-a-time by leveraging Instruction Level Parallelism (ILP).

We have separate benchmarks for serial aes*_block vs aes*_blocks where you can see the performance difference.

That said, it would probably be good to add VAES support for microarchitectures where it does provide performance benefits beyond what's possible with ILP.

@nazar-pc
Copy link
Author

Absolutely, I am aware of that. Already using *_block and there is a difference indeed.

Those methods are the reason I looked inside wondering if it is also VAES-capable on top of instruction parallelism.

@newpavlov
Copy link
Member

Unfortunately, currently I do not have access to a machine with AVX-512 so can not work on it myself.

If someone will work on this, it also could be worthwhile to also add support of VPCLMULQDQ to the polyvalcrate.

@nazar-pc
Copy link
Author

Shouldn't be difficult to get server VM with AVX512 support, I can help with that if you'd like.

@tarcieri
Copy link
Member

@newpavlov polyval is/was written in a way that LLVM will already use VPCLMULQDQ when the target supports it: RustCrypto/universal-hashes#44

...though perhaps we could be explicit about it.

@tarcieri
Copy link
Member

Also I have several servers with AVX-512 support I can test on.

@newpavlov
Copy link
Member

newpavlov commented Jul 26, 2023

@tarcieri
I think it should be a relatively straightforward implementation? You would need to play with number of 512-bit blocks processed in parallel, because VAES instructions may have a different latency/throughput ratio and compiler can use 32 registers instead of 16. I think an optimal number will be bigger than the 8 blocks used in the AES-NI backend. IIUC VAES does not have instructions for key expansion, so you can adapt the existing AES-NI-based code.

The only annoying part should be plumbing in the autodetection module, but we can leave it for later, since VAES support will be gated either way (the relevant intrinsics are Nightly-only right now).

I may draft a PR on this weekend or during next week, if you will not get to it before that.

polyval is/was written in a way that LLVM will already use VPCLMULQDQ when the target supports it:

I thought about using parallelism (VPCLMULQDQ can process four 128-bit blocks at once). But I forgot that GHASH/POLYVAL is inherently sequential, so we can not utilize it while processing one text...

@tarcieri
Copy link
Member

@newpavlov I opened a separate issue for VPCLMULQDQ here, I think that should be (potentially) fairly easy: RustCrypto/universal-hashes#184

POLYVAL/GHASH can be broken down into a parallelizable portion and a sequential portion... there's an accumulation of the output that is inherently sequential, but multiplication of the inputs can be performed in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants