Add Flops force SSE for direct comparison, add flops AVX512 option#17
Open
FCLC wants to merge 31 commits intoTheRainDoodle:masterfrom
Open
Add Flops force SSE for direct comparison, add flops AVX512 option#17FCLC wants to merge 31 commits intoTheRainDoodle:masterfrom
FCLC wants to merge 31 commits intoTheRainDoodle:masterfrom
Conversation
Performance metrics on AVX-512 enabled alderlake: 8 P-cores/16threads of AVX-512 : Run 1/10 Time: 0.08902 Run 2/10 Time: 0.08652 Run 3/10 Time: 0.08655 Run 4/10 Time: 0.0866 Run 5/10 Time: 0.08652 Run 6/10 Time: 0.08653 Run 7/10 Time: 0.08604 Run 8/10 Time: 0.08657 Run 9/10 Time: 0.08641 Run 10/10 Time: 0.08656 Executed 1248 billion instructions/second Score: 30.17 Phenom's II's worth's Single P-core/Single thread: Run 1/10 Time: 0.6904 Run 2/10 Time: 0.6853 Run 3/10 Time: 0.6853 Run 4/10 Time: 0.6853 Run 5/10 Time: 0.6853 Run 6/10 Time: 0.6853 Run 7/10 Time: 0.6852 Run 8/10 Time: 0.6853 Run 9/10 Time: 0.6853 Run 10/10 Time: 0.6852 Executed 156.7 billion instructions/second Score: 15.11 Phenom's II's worth's
Update RM.md, add data and add reference to AVX512
… more documentation to asm and main
…henom II 810 4 core.
…teaching use of GPR, AVX, AVX2 and AVX512
Author
|
now added avx512 version of test 3, SHR REG, CL. see 40dbd26 performance using avx 512 on 12700k, 8 cores with SMT was 1259 billion instructions/sec. ~= 42 Phenoms II's |
…e to 50 instead of 10
Author
|
You'll want to avoid merging 0870ce3 if the intention is to continue to support windows. the windows tests were removed as they skew the results when dealing with multi threading. In one case, windows slowed down AVX2 performance on a 5800X3D from 0.69T ins/second to ~0.5T. If on windows, multi core support should be removed, and users directed to use WSL2 in a pinch |
- Removed timing displays between each run; - Added compute of mean time per run (with standard deviation to validate the measures); - Added `printHelp` function to avoid re-printing the available 'commands' each time.
Adding measure of average time per run, simplified output between runs and added clang-format
(ReadME) Words are hard, let's fix them
Author
|
Development of these tests for AARCH64 is ongoing, it's within a fork of my repo |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've also Added the option for the repo to potentially be renamed to a "Phenominal" Benchmark as a fun tongue and cheek reference to the baseline CPU.
Next would be to add the ability to only allow for 4 physical cores as one of the toggles (we'd have to provision for SMT to avoid internal L1i and L1d contention amongst other issues if possible)
Some of the read me need not be merged, as it's specific to my environment (alder lake and the like)
Beyond that feels free to incorporate as you see fit.
Depending on time I may add other CPU's/GPU for laughs, but specifically for FLOPS I've added the single and multi core in the readme