Skip to content

Add Flops force SSE for direct comparison, add flops AVX512 option#17

Open
FCLC wants to merge 31 commits intoTheRainDoodle:masterfrom
FCLC:master
Open

Add Flops force SSE for direct comparison, add flops AVX512 option#17
FCLC wants to merge 31 commits intoTheRainDoodle:masterfrom
FCLC:master

Conversation

@FCLC
Copy link

@FCLC FCLC commented Feb 11, 2022

I've also Added the option for the repo to potentially be renamed to a "Phenominal" Benchmark as a fun tongue and cheek reference to the baseline CPU.

Next would be to add the ability to only allow for 4 physical cores as one of the toggles (we'd have to provision for SMT to avoid internal L1i and L1d contention amongst other issues if possible)

Some of the read me need not be merged, as it's specific to my environment (alder lake and the like)

Beyond that feels free to incorporate as you see fit.

Depending on time I may add other CPU's/GPU for laughs, but specifically for FLOPS I've added the single and multi core in the readme

FCLC and others added 11 commits February 10, 2022 12:42
Performance metrics on AVX-512 enabled alderlake:

8 P-cores/16threads of AVX-512 :

Run 1/10 Time: 0.08902
Run 2/10 Time: 0.08652
Run 3/10 Time: 0.08655
Run 4/10 Time: 0.0866
Run 5/10 Time: 0.08652
Run 6/10 Time: 0.08653
Run 7/10 Time: 0.08604
Run 8/10 Time: 0.08657
Run 9/10 Time: 0.08641
Run 10/10 Time: 0.08656
Executed 1248 billion instructions/second
Score: 30.17 Phenom's II's worth's

Single P-core/Single thread:

Run 1/10 Time: 0.6904
Run 2/10 Time: 0.6853
Run 3/10 Time: 0.6853
Run 4/10 Time: 0.6853
Run 5/10 Time: 0.6853
Run 6/10 Time: 0.6853
Run 7/10 Time: 0.6852
Run 8/10 Time: 0.6853
Run 9/10 Time: 0.6853
Run 10/10 Time: 0.6852
Executed 156.7 billion instructions/second
Score: 15.11 Phenom's II's worth's
Update RM.md, add data and add reference to AVX512
@FCLC
Copy link
Author

FCLC commented Feb 17, 2022

now added avx512 version of test 3, SHR REG, CL. see 40dbd26

performance using avx 512 on 12700k, 8 cores with SMT was 1259 billion instructions/sec.

~= 42 Phenoms II's

@FCLC
Copy link
Author

FCLC commented Jun 22, 2022

You'll want to avoid merging 0870ce3 if the intention is to continue to support windows.

the windows tests were removed as they skew the results when dealing with multi threading.

In one case, windows slowed down AVX2 performance on a 5800X3D from 0.69T ins/second to ~0.5T.

If on windows, multi core support should be removed, and users directed to use WSL2 in a pinch

- Removed timing displays between each run;
- Added compute of mean time per run (with standard deviation to validate the measures);
- Added `printHelp` function to avoid re-printing the available 'commands' each time.
FCLC and others added 3 commits June 23, 2022 16:13
Adding measure of average time per run, simplified output between runs and added clang-format
(ReadME) Words are hard, let's fix them
@FCLC
Copy link
Author

FCLC commented Jul 2, 2022

Development of these tests for AARCH64 is ongoing, it's within a fork of my repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants