-
Notifications
You must be signed in to change notification settings - Fork 2.3k
OpenCL: Default to table based AES, now backed in local memory #5613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL: Default to table based AES, now backed in local memory #5613
Conversation
ROR with 8, 16 or 24 can be made using byte_perm instruction. I saw no gain so left it disabled but it might be nice to have it sitting there for reference / future testing.
6154359
to
dffd323
Compare
Up to 6.6x boost seen on super's AMD, and a mere 3x on the nvidias. I was hoping to achieve 10x but it turns out the bitsliced AES we used was too good as baseline for that to happen :) Anyway we seem to be on par with hashcat now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff, but impossible to review for real without diving into it.
As a minor suggestion, maybe the moving of tables to a separate file can be a separate commit?
Also use same file for bitlocker format, which had another copy of them. The whole commit is effectively a no-op.
This revised version pushes 1.4 Tbps of AES-128 decryption (axcrypt) or 914 Gbps of AES-256 encryption (keepass) on a 4070ti. The bitsliced code we defaulted to before is really good but it's register hungry. It has some merits when two or more blocks are encrypted/decrypted at once (does two in parallel) but still is slower than table based now. We can still opt in to use it. Note: This commit switches all formats to table-based AES without actually enabling the copying to local until next commit where all formats are adapted to use it. This very commit thus makes for a performance regression. See openwall#5594
Enable local memory for table-based AES. Closes openwall#5594 Bitlocker format is not affected as it has it's own implementation, but AES performance is insignificant for it anyway.
dffd323
to
493a6a5
Compare
The important changes are to opencl_aes_plain.h and are surprisingly few due to macros.
I fail to see the point of that, but did so now. BTW the bitlocker format was also changed to use that table file but was otherwise not changed - it has its own copy of more or less identical (afaics) table based AES but did not gain anything from using local memory as the AES part of it is insignificant. So I did not commit any such changes to it. |
Add dmg-opencl, rar-opencl to the list of problematic formats. Side effect of openwall/john#5613. Document all formats that fail and therefore need to be disabled during testing. Signed-off-by: Claudio André <[email protected]>
@@ -53,10 +53,13 @@ typedef struct { | |||
uint32_t cracked; | |||
} result; | |||
|
|||
#define AES_MAXNR 14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this addition of AES_MAXNR
to opencl_keepass_fmt_plug.c
looks unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, that's a remnant from the older state struct.
Thank you for doing it. Looks cleaner to me that way, and makes the actual changes (in other commits) stand out. |
@@ -162,7 +163,7 @@ __kernel void sevenzip_aes(__constant sevenzip_salt *salt, | |||
/* Early rejection if possible (only decrypt last 16 bytes) */ | |||
if (pad > 0 && salt->length >= 32) { | |||
uint8_t buf[16]; | |||
AES_KEY akey; | |||
AES_KEY akey; akey.lt = < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh BTW I first had this as AES_KEY akey = { .lt = < };
everywhere, but some device did not like that. Unfortunately I forget which. Good to know, perhaps I should start a wiki page listing knowledge like that. Another example would be the static
vs inline
vs static inline
that we had to add a workaround for in opencl_misc.h.
Add raw-SHA512-free-opencl to the list of problematic formats. Side effect of openwall/john#5613 and openwall/john#5615. Document all formats that fail and therefore need to be disabled during testing. Signed-off-by: Claudio André <[email protected]>
Add raw-SHA512-free-opencl to the list of problematic formats. Side effect of openwall/john#5613 and openwall/john#5615. Document all formats that fail and therefore need to be disabled during testing. Signed-off-by: Claudio André <[email protected]>
Hi, can you use inverse sbox to speed up the invert key mix columns. I got 40% speed up doing it in hashcats code on aes256 |
Are we not doing that already? |
I don't think so? Especially if your speed is only on par with hashcat |
I'm trying to context switch into this. My current guess is you mean we don't have the four T-tables for decryption, only a plain reverse sbox table - right? Is your code in hashcat already or are you planning a PR? Feel free to add a PR here as well! |
Above is roughly what I'm doing, I have left some random things out unrelated like hashcats decrypt function. This was mostly chatgpts idea so take it with a grain of salt, regardless it got me 40% increase in speed on a certain hashcat mode which also has additional md5 steps( and it works perfect). The only thing that has actually changed is the invert key step and the tables. And no I havn't PR'd it there, from what I understand they dont care much about improving AES. |
You are very welcome to make a PR for us, for fun and fame 😉 If not, I will look into it! And anyway thanks a lot for this suggestion! |
I do not care for fame on things that was not my idea. I just randomly came across this issue and suggested it as it helped me. I won't be PRing it as I dont have the time to figure out your code aswell. GL. |
These are used for set_decrypt_key() only, so would mainly affect formats that decrypt a small amount per candidate. While a decent boost was reported for hashcat, we only got a regression (as tested on nvidia) so this is left disabled for now. Closes openwall#5800, see openwall#5613 (comment)
This would mostly affect formats that decrypt a small amount per key (several formats only decrypt one or two blocks). This should theoretically boost AES_set_decrypt_key() by halving the number of table lookups but the results were disappointing so it's left disabled for now. Closes openwall#5800, see openwall#5613 (comment)
This would mostly affect formats that decrypt a small amount per key (several formats only decrypt one or two blocks). This should theoretically boost AES_set_decrypt_key() by halving the number of table lookups but the results on nvidia are disappointing as of now. Others get more or less boost. Closes openwall#5800, see openwall#5613 (comment)
This boosts AES_set_decrypt_key() by halving the number of table lookups. It mostly affects formats that decrypt a small amount per key (several formats only decrypt one or two blocks). Closes openwall#5800, see openwall#5613 (comment)
This boosts AES_set_decrypt_key() by halving the number of table lookups. It mostly affects formats that decrypt a small amount per key (several formats only decrypt one or two blocks). Closes openwall#5800, see openwall#5613 (comment)
This boosts AES_set_decrypt_key() by halving the number of table lookups. It mostly affects formats that decrypt a small amount per key (several formats only decrypt one or two blocks). Closes openwall#5800, see openwall#5613 (comment)
This revised version pushes 1.4 Tbps of AES-128 decryption (axcrypt) or 914 Gbps of AES-256 encryption (keepass) on a 4070ti.
The bitsliced code we defaulted to before is really good but it's register hungry. It has some merits when two or more blocks are encrypted/decrypted at once (does two in parallel) but still is slower than table based now. We can still opt in to use it.
Closes #5594