-
|
The go-to implementation for cryptographic random number generator on .NET is On Windows, it is implemented via call to BCryptGenRandom function from BCrypt.dll here, which is a well-known approach, e.g. used in SecurityDriven.Inferno's CryptoRandom and multiple other places. Interop.BCrypt.BCryptGenRandom(IntPtr.Zero, pbBuffer, count, Interop.BCrypt.BCRYPT_USE_SYSTEM_PREFERRED_RNG)I've known about this approach for about ten years, so recently I've started wondering if there is a faster and more modern API on Windows. Interop.BCrypt.BCryptGenRandom(Interop.BCrypt.BCRYPT_RNG_ALG_HANDLE, pbBuffer, count, 0)I didn't find an actual source for this recommendation, just direct references to documentation, so I guess it was generated by some LLM. Indeed, Golang, Chromium, BoringSSL and Rust have moved away from I've cooked up some dirty benchmark to check this: public unsafe class Benchmarks
{
[DllImport("BCrypt.dll")]
internal static extern int BCryptGenRandom(IntPtr hAlgorithm, byte* pbBuffer, int cbBuffer, int dwFlags);
[DllImport("BCryptPrimitives.dll")]
internal static extern int ProcessPrng(byte* pdData, int cbData);
public delegate void Rng(byte* ptr);
public static long Cycle(Rng action)
{
byte* buffer = stackalloc byte[8];
long acc = 0;
for (int i = 0; i < 10000; i++)
{
action(buffer);
acc ^= Unsafe.ReadUnaligned<long>(buffer);
}
return acc;
}
[Benchmark]
public long ProcessPrng() => Cycle(ptr => ProcessPrng(pdData: ptr, cbData: 8));
[Benchmark]
public long BCryptGenRandomPseudoHandle() => Cycle(ptr => BCryptGenRandom(hAlgorithm: 0x81 /*BCRYPT_RNG_ALG_HANDLE*/, pbBuffer: ptr, cbBuffer: 8, dwFlags: 0));
[Benchmark(Baseline = true)]
public long BCryptGenRandomFlags() => Cycle(ptr => BCryptGenRandom(hAlgorithm: 0, pbBuffer: ptr, cbBuffer: 8, dwFlags: 2 /*BCRYPT_USE_SYSTEM_PREFERRED_RNG*/));
}Not the best benchmark possible, just small enough to paste here and to check if there is a difference. Judging from results: BenchmarkDotNet v0.15.6, Windows 11 (10.0.26200.7462)
the difference between Flags and PseudoHandle is neglible (and not very stable), but ProcessPrng is indeed noticeably faster. I've tried juggling it a bit, e.g. manually expanding Cycle delegate into benchmark methods, trying to fill up larger structs - the difference is still there. Considering all of the above, I have two questions:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
I had a chat with a compatriot on the OS crypto team.
Given that the API-set is not being super-obvious, the recommendation I got was to use the pseudo-handle (if we want to make a change). "But it could be faster if we just jump to ProcessPrng anyways!" Sure... but the team that owns it asked us not to (at least not "yet"). And is the CSPRNG really the limiting factor on your hotpath? I would imagine most of that perf delta is the function call overhead, so for better bytes/second you'd want to bulk-fetch, not fetch 8 bytes at a time. |
Beta Was this translation helpful? Give feedback.
I had a chat with a compatriot on the OS crypto team.
ext-ms-win-cng-rng-l1) on my computer, so it's probably one of those "if you referen…