-
Notifications
You must be signed in to change notification settings - Fork 40
Bundle multiple allocations into workspaces #1379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f103112 to
93d235d
Compare
|
@davidchisnall Seeing this will be relevant to your use case, you may want to have a look. It's understood that you'll need a context parameter as well, but this can be a follow-up PR. |
b2df5cf to
652aa43
Compare
|
I don't like the There is also a tradeoff in consolidating the allocations. Fewer calls to the allocator will be better for performance, but larger allocations are more likely to fail. I put the performance / stack usage numbers in this post for the Verilator simulator of the slowest CHERIoT core. The signing and verification steps are 4-5M cycles, so a few hundred extra cycles for heap allocation is in the noise, but failing heap allocation is a problem. |
Why would the same mechanism not work for entropy failure? I was hoping that we could similarly set an error code (which is currently an unrefined What would your suggestion be? |
Yes, that would work. Currently the get-entropy function has a |
Does this need any co-operation from the application/library? If not, it seems the current |
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 7d16cd7 | Previous: 71773d9 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
38968 cycles |
38914 cycles |
1.00 |
ML-KEM-512 encaps |
44887 cycles |
44730 cycles |
1.00 |
ML-KEM-512 decaps |
56701 cycles |
56823 cycles |
1.00 |
ML-KEM-768 keypair |
64330 cycles |
65610 cycles |
0.98 |
ML-KEM-768 encaps |
72031 cycles |
72813 cycles |
0.99 |
ML-KEM-768 decaps |
87999 cycles |
87757 cycles |
1.00 |
ML-KEM-1024 keypair |
96164 cycles |
96052 cycles |
1.00 |
ML-KEM-1024 encaps |
106319 cycles |
106405 cycles |
1.00 |
ML-KEM-1024 decaps |
126938 cycles |
126654 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 7d16cd7 | Previous: 71773d9 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
36515 cycles |
36187 cycles |
1.01 |
ML-KEM-512 encaps |
43046 cycles |
42619 cycles |
1.01 |
ML-KEM-512 decaps |
56148 cycles |
55403 cycles |
1.01 |
ML-KEM-768 keypair |
59998 cycles |
59471 cycles |
1.01 |
ML-KEM-768 encaps |
68176 cycles |
67728 cycles |
1.01 |
ML-KEM-768 decaps |
85747 cycles |
84838 cycles |
1.01 |
ML-KEM-1024 keypair |
88749 cycles |
88148 cycles |
1.01 |
ML-KEM-1024 encaps |
99071 cycles |
98356 cycles |
1.01 |
ML-KEM-1024 decaps |
121586 cycles |
120110 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 7d16cd7 | Previous: 71773d9 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
59383 cycles |
59377 cycles |
1.00 |
ML-KEM-512 encaps |
68529 cycles |
68765 cycles |
1.00 |
ML-KEM-512 decaps |
88650 cycles |
87486 cycles |
1.01 |
ML-KEM-768 keypair |
99659 cycles |
99614 cycles |
1.00 |
ML-KEM-768 encaps |
111486 cycles |
111493 cycles |
1.00 |
ML-KEM-768 decaps |
136633 cycles |
136258 cycles |
1.00 |
ML-KEM-1024 keypair |
149176 cycles |
149294 cycles |
1.00 |
ML-KEM-1024 encaps |
164998 cycles |
165110 cycles |
1.00 |
ML-KEM-1024 decaps |
196336 cycles |
196272 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks
Details
| Benchmark suite | Current: 7d16cd7 | Previous: 71773d9 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
50706 cycles |
50856 cycles |
1.00 |
ML-KEM-512 encaps |
58961 cycles |
58653 cycles |
1.01 |
ML-KEM-512 decaps |
74965 cycles |
75093 cycles |
1.00 |
ML-KEM-768 keypair |
88105 cycles |
87534 cycles |
1.01 |
ML-KEM-768 encaps |
95599 cycles |
95539 cycles |
1.00 |
ML-KEM-768 decaps |
118572 cycles |
119501 cycles |
0.99 |
ML-KEM-1024 keypair |
130239 cycles |
130561 cycles |
1.00 |
ML-KEM-1024 encaps |
142448 cycles |
142169 cycles |
1.00 |
ML-KEM-1024 decaps |
173815 cycles |
173715 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks
Details
| Benchmark suite | Current: 7d16cd7 | Previous: 71773d9 | Ratio |
|---|---|---|---|
ML-KEM-512 keypair |
155119 cycles |
155354 cycles |
1.00 |
ML-KEM-512 encaps |
163151 cycles |
163232 cycles |
1.00 |
ML-KEM-512 decaps |
206149 cycles |
206517 cycles |
1.00 |
ML-KEM-768 keypair |
261039 cycles |
261154 cycles |
1.00 |
ML-KEM-768 encaps |
275418 cycles |
275821 cycles |
1.00 |
ML-KEM-768 decaps |
338056 cycles |
337878 cycles |
1.00 |
ML-KEM-1024 keypair |
395001 cycles |
395312 cycles |
1.00 |
ML-KEM-1024 encaps |
421934 cycles |
423303 cycles |
1.00 |
ML-KEM-1024 decaps |
505554 cycles |
507323 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
|
I changed the approach to use an explicit error check in the code rather than expecting Also, this is now based on #1389 which doesn't introduce workspaces. |
a876fb7 to
7bbba13
Compare
mkannwischer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hanno-becker, one small nit on the code, otherwise this looks good to me.
Two points remain to be discussed:
- The previous approach would allow a user to allocate different buffers in different memories, e.g., one could allocate the large matrix A on the heap while all other buffers stay on the stack. This is no longer possible with the workspace approach. This may be particularly relevant for embedded systems.
- As @davidchisnall noted here allocating everything at once is more likely to fail which is undesirable.
Any thoughts @davidchisnall, @gilles-peskine-arm, @hanno-becker?
2c78165 to
793b43d
Compare
|
Unless @davidchisnall @gilles-peskine-arm have concrete use cases in mind where we know that the monolithic allocation will cause problems (?), I would suggest to add workspaces for now and adjust things on demand. This is internal and can be changed at any time. Esp. for mldsa-native I am hoping that this will also help us to reduce CBMC overhead from dynamic allocation which is currently blocking proofs. |
|
I very much prefer the monolithic allocation approach, unless there is an identified use case for fragmented allocation or monolithic allocation is too hard to implement. Let's consider the three typical types of memory that might be used.
There may be a reason to use different types of memory for some data. If a system has a small amount of memory that's better protected against snooping, that memory should be prioritized for key-equivalent secret data over message-specific secret data during signing. But I think if you want that level of control, you should be prepared to maintain your own platform-specific cryptography code, or at least your own patches. It's not a level of control that we intend to provide in TF-PSA-Crypto (whereas we would like to give a choice between stack and a user-provided buffer). In addition, I would prefer to avoid having low-level functions that can fail: ideally some high-level function would reserve the necessary amount of memory (and generate the random data), and then all low-level functions can't fail. One benefit of that is that it reduces the risk of doing error handling wrong, although having a CBMC proof mostly mitigates that risk. Another benefit is the code size: the cost of all |
Thanks @gilles-peskine-arm for this extensive comment. Makes sense to me. |
793b43d to
cce203f
Compare
mkannwischer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hanno-becker. LGTM.
cce203f to
4cbb5f7
Compare
b0b7a86 to
c37d1ea
Compare
This commit introduces 'workspace structures', which are are function-local structures wrapping all objects and buffers utilized by the function into a single struct. The benefit of the workspace struct is a simpler allocation boilerplate: Instead of allocating multiple structures and separately handling their success/failure, we now only need to allocate a single object per function. Signed-off-by: Hanno Becker <[email protected]>
7840809 to
76a56c6
Compare
#1389 plus consolidation of multiple allocations into workspaces.