Release Notes

Here we draft the release notes for the next release.

Note: format is [summary] [commit hash or PR#] [author(s)]

Use the release notes helper script to generate the preliminary list. Then group the changes and review the descriptions and look out for ????

Mostly the first line of the commit line is a good summary, but please think through each entry and (re)write a summary that helps users quickly determine if this change would be interesting/useful to them. For example, include the name of the intrinsic/function in the summary so that users don't have to click through each commit themselves.

SIMDe 0.7.4

Summary

Minimum meson version is now 0.54
Initial support for F16C
Initial support for the E2K (Elbrus) architecture

Details

Implementation of NEON intrinsics:

neon/ext: add _mm_alignr_{,e}pi8 implementations 6d28f04 @nemequ
neon/rhadd: optimizations for rhaddq_xxx f730009 @aqrit
neon: test for MMX/SSE instead of x86 when choosing implementation 0366dab @nemequ
neon/abs: add SSE2 integer abs implementations 6396dc8 @aqrit
neon/shr_n: fix variable name in GFNI implementation of vshrq_n_s8 e751352 @nemequ

x86 intrinsics

Fix native aliases for amd64-only functions f0e9755 @nemequ

MMX

SSE*

sse2: workaround missing vcvtnq_s32_f32 on GCC e11258e @jpcima
sse2: fix incompatible argument in A32 impl. of _mm_cvtps_epi32 b5fbe39 @jpcima
ssse3: Add SSE2 integer abs implementation 2de8624 @aqrit
sse4.2: re-enable native _mm_cmpgt_epi64 7117c48 @aqrit

AVX

avx: work around missing _mm256_{load,store}u_m128{,i,d} on LCC a3a39e2 @nemequ

AVX2

AVX512

avx512/abs: add SSE2 implementation of _mm_abs_epi64 5c2f423 @aqrit
avx512/madd: fix arguments for native aliases ae545ce @nemequ
avx512/madd: explicitly promote 16-bit elements to 32-bit e5dd146 @nemequ
avx512: work around several bugs in older versions of clang e64231e @nemequ

GFNI

gfni: improve ARM NEON implementation a99a3ec @rosbif
gfni: add ARM, PPC and WASM implementations of gf2p8mul intrinsics 61126b3 @rosbif
gfni: add cast to work around -Wimplicit-int-conversion warning d066a1c @nemequ
gfni: remove unintentional dependency on vector extensions bdfa828 @nemequ
gfni: work around clang bug #50932 7d4beba @nemequ
gfni: work around error with vec_bperm on clang-10 on POWER 8620bd0 @nemequ
gfni: replace vec_and and vec_xor with & and ^ on z/arch f5577dc @nemequ
gfni: add many x86, ARM, z/Arch, PPC and WASM implementations 97eb961 @rosbif

XOP

F16C

f16c: initial implementation 62c1087 @nemequ

SVML

svml: trivial indentation fix 2176652 @nemequ

Arch support

various: correct PPC and z/Arch versions plus typo ac8d722 @rosbif

z/Arch

Correctly detect and handle z/Arch and its vector extensions 4a3f466 @nemequ
Fix z/Arch without zvector. b8af226 @nemequ
sse, sse2: add several z/Arch implementations 4f628ac @nemequ
sse2, sse4.1: additional z/Arch implementations for ksw2 ee24439 @milot-mirdita
Many additional z/Architecture implementations of x86 functions 5a2b035 @nemequ
se2, sse4.1: additional z/Arch implementations for ksw2 ee24439 @milot-mirdita
sse4.1, neon/bsl: v/Arch implementations of blendv/bsl functions 80a8484 @nemequ
z/Architecture implementations for remaining min/max functions 694d547 @nemequ
neon/cvt: z/Arch implementations 107fab8 @nemequ
sse, sse4.1: z/Arch implementations of some rounding functions 9fb1509 @nemequ
sse, sse2, neon/dup_n: lots of z/Arch splat-based implementations 874d51f @nemequ
gfni: add z/Arch version c12f111 @rosbif
x86,arm/neon: Correct z/Arch versions 50fba9b @rosbif
docker: add -march=z14 -mzvector to s390x-gcc-10 build. 8f60406 @nemeq
docker: use z13 instead of z14 for s390x architecture a524be2 @nemequ

Altivec

sse, sse2: generate to/from altivec functions for SSE/SSE2 types. dd3ff53 @nemequ
docker: power9-clang ignore deprecated-altivec-src-compat warnings b70f1a2 @mr-c
sse4.1: PPC AltiVec has no vec_splat_s64 debbf73 @rosbif
arch: fix SIMDE_ARCH_POWER_ALTIVEC_CHECK to include AltiVec check 8534e64 @nemequ
simd128: add AltiVec implementations of any/all_true a3b2630 @nemequ

Testing with Docker/Podman & CI

gh-actions: add some bionic-era GCC builds ccdd24b @nemequ
gh-actions: add several clang builds e4b4646 @nemequ
drone: read testlog.txt if tests fail eb71d89 @nemequ

Misc

Improve abs function performance on SSE/SSE2 093f6ee @jpcima
Upgrade Hedley to v15 0d070e1 @nemequ
detect-clang: fix version numbers for clang < 4.0 8a2c645 @nemequ
e2k: Introduce E2K (Elbrus) architecture 093b2c5 @makise-homura
e2k, ppc: Make shifts unsigned 24ddeba @makise-homura
align: add MCST LCC to compilers known to support __alignof__ 38e3840 @nemequ
common: add an MCST LCC check for vector features. e38fe50 @nemequ
complex: fix checks for GCC C complex math support ad8c7e0 @nemequ
Fix SIMDe link in no-tests README 21f7a2a @maxbachmann
common: enable OpenMP by default on LCC ff34d1b @nemequ
README: more thoroughly document OpenMP support 46c65e1 @nemequ

Testing

download-sde: be more tolerant of changes on Intel's web site 87bb927 @nemequ
meson: require meson version 0.54 349da2b @makise-homura
testing: Require exact matches for abs functions 9085d94 @jpcima
test: replace 1e-##precision with to_slop functions 9adcc21 @nemequ
test: allow passing INT_MAX for precision for exact comparisons e903b7f @nemequ

Release Notes

SIMDe 0.7.4

Summary

Details

Implementation of NEON intrinsics:

x86 intrinsics

MMX

SSE*

AVX

AVX2

AVX512

GFNI

XOP

F16C

SVML

Arch support

z/Arch

Altivec

Testing with Docker/Podman & CI

Misc

Testing

CI

Summary

Details

Implementation of NEON intrinsics:

SVML

x86 intrinsics

MMX

SSE*

AVX

AVX2

AVX512

GFNI

XOP

F16C

Testing with Docker/Podman & CI

Misc

Clone this wiki locally