Skip to content
Michael R. Crusoe edited this page Feb 2, 2023 · 40 revisions

Here we draft the release notes for the next release.

Note: format is [summary] [commit hash or PR#] [author(s)]

Use the release notes helper script to generate the preliminary list. Then group the changes and review the descriptions and look out for ????

Mostly the first line of the commit line is a good summary, but please think through each entry and (re)write a summary that helps users quickly determine if this change would be interesting/useful to them. For example, include the name of the intrinsic/function in the summary so that users don't have to click through each commit themselves.

SIMDe 0.7.4

Summary

  • Minimum meson version is now 0.54
  • Initial support for F16C
  • Initial support for the E2K (Elbrus) architecture

Details

Implementation of Arm intrinsics

NEON

  • aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a22c3cc @mr-c
  • neon: Implement f16 types 21496f6 @Glitch18
  • neon: port additional code to new style 1c744fd @nemequ
  • neon: replace some more abs/labs/llabs usage with simde_math_* versions c59853a @nemequ
  • neon: refactor to use different types on all targets c17957a @nemequ
  • neon/abd: add much better implementations c3ddbbe @nemequ
  • neon/abd: Wasm SIMD implementation 220db33 @ngzhian
  • neon/abs: add SSE2 integer abs implementations 6396dc8 @aqrit
  • neon/addhn: initial implementation e9ee066 @nemequ
  • neon/add: Implement f16 functions e69239c @Glitch18
  • neon/{add,sub}w_high: use vmovl_high instead of vmovl + get_high b897331 @nemequ
  • neon/bcax: initial implementation 96ce481 0ed3dea @Glitch18
  • neon/bsl: Implement f16 functions edb75b5 @Glitch18
  • neon/cage: Initial f16 implementations 20df81d @Glitch18
  • neon/cagt: Implement f16 functions 452a6d3 @Glitch18
  • neon/ceq: Implement f16 functions f24ab3d @Glitch18
  • neon/ceqz: Implement f16 functions dd2ebf2 de301cd @Glitch18
  • neon/cge: Implement f16 functions a512986 f3ad0d4 647dc12 @Glitch18
  • neon/cgez: complete implementation of CGEZ family 6d86a20 @Glitch18
  • neon/cgt: Add implementation of remaining functions 9930c43 @Glitch18
  • neon/cgt, simd128: improve some unsigned comparisons on x86 ae6702a @nemequ
  • neon/cgtz: Add implementations of remaining functions 4d749b5 @Glitch18
  • neon/cle: add some x86 implementations 5906cc9 d81c7e7 @nemequ 7894c7d @Glitch18
  • neon/clez: Add implementaions of scalar functions bc72880 @Glitch18
  • neon/clt: Add implementations of scalar functions & SSE/AVX512 fallbacks bc636e1 6a19637 @Glitch18
  • neon/cltz: Add scalar functions and natural vector fallbacks 2960ef0 @Glitch18
  • neon/cmla, neon/cmla_rot{90,180,270}: check compiler versions e98152f @nemequ
  • neon/cmla, neon/cmla_rot{90,180,270}: CMLA requires armv8.3+ 280faae @nemequ
  • neon/cmla, neon/cmla_rot{90,180,270}, neon/fma: initial implementation 2aff4f9 @Glitch18
  • neon/cnt: add x86 implementations of vcntq_s8 a558d6d @nemequ
  • neon/cvt: add __builtin_convertvector implementations d06ea5b @nemequ
  • neon/cvt: add out-of-range and NaN tests 7d0e2ac @nemequ
  • neon/cvt: add some faster x86 float->int/uint conversions ceaaf13 @nemequ
  • neon/cvt: Add vcvt_f32_f64 and vcvt_f64_f32 implementations 8398f73 @Glitch18
  • neon/cvt: cast result of float/double comparison dc215cd @ngzhian
  • neon/cvt: disable some code on 32-bit x86 which uses _mm_cvttsd_si64 48edfa9 @nemequ
  • neon/cvt: don't use vec_ctsl on POWER 8f9582a @nemequ
  • neon/cvt: fix a couple of s390x implementations' NaN handling a8bd33d @nemequ
  • neon/cvt: fix compilation with -ffast-math d1d070d @nemequ
  • neon/cvt: Implement f16 functions b6a9882 @Glitch18
  • neon/cvt, relaxed-simd: add work-around for GCC bug #101614 11aa006 @nemequ
  • neon/cvt, simd128: fix compiler errors on PPC 965e68e @nemequ
  • neon/dot_lane: add remaining implementation 3f1c1fa @Glitch18
  • neon/dot_lane: correct implementations of dot_lane functions 4a9ca8a @Glitch18
  • neon/dup_lane: add shuffle-based implementations 014ee00 @nemequ
  • neon/dup_lane: Complete implementation of function family 12fb731 @Glitch18
  • neon/dup_lane: fix macro for simde_vdup_laneq_u16 9461557 @nemequ
  • neon/dup_lane: implement vdupq_lane_f64 df320d1 @Glitch18
  • neon/dup_lane: use dup_n 2b4a009 @ngzhian
  • neon/dup_n: Implement f16 functions 14fdf88 @Glitch18
  • neon/dup_n: replace remaining functions with dup_n implementations 27a13b0 @nemequ
  • neon/dupq_lane: native and portable 893db57 @ngzhian
  • neon/ext: add __builtin_shufflevector implementation de8fe89 @ngzhian
  • neon/ext: add _mm_alignr_{,e}pi8 implementations 6d28f04 @nemequ
  • neon/ext: clean up shuffle-based implementation f1de709 @nemequ
  • neon/fma: add a couple x86 and PPC implementations 7a2860b @nemequ
  • neon/fma: add more extensive feature checking e541dd1 @nemequ
  • neon/fma_lane: Implement fmaq_lane functions a77e6ad @Glitch18
  • neon/fma_lane: portable and native implementations 555ef3e @Glitch18
  • neon/fma_n: initial implementation 06d5a62 @nemequ
  • neon/fma_n: the 32-bit functions are missing on GCC on arm dab4342 @nemequ
  • neon/get_high: add __builtin_shufflevector optimizations 4003afa @ngzhian
  • neon/get_low: use __builtin_shufflevector if available ea3f75e @ngzhian
  • neon/hadd,hsub: optimization for Wasm ebe09d8 @ngzhian
  • neon/ld1: add Wasm SIMD implementation a79bc15 @ngzhian
  • neon/ld1_dup: Add f64 function implementations 6c71aac @Glitch18
  • neon/ld1_dup: native and portable (64-bit vectors) debb3c8 @ngzhian
  • neon/ld1_dup: split from ld1, dup_n fallbacks, WASM implementations 4c586e0 @nemequ
  • neon/ld1: Fix macros in order to workaround bugs f26f775 @Glitch18
  • neon/ld1: Implement f16 functions 6e89a9c @Glitch18
  • neon/ld1_lane: Implement remaining functions de2de8d @Glitch18
  • neon/ld1_lane: portable and native implementations 9051a51 @ngzhian
  • neon/ld1q: u8_x2, u8_x3, u8_x4 341006c @ngzhian
  • neon/ld1[q]_*_x2: initial implementation cd14634 @dgazzoni
  • neon/ld{2,3,4}: disable -Wmaybe-uninitialized on all recent GCC e142a59 @nemequ
  • neon/ld{2,3,4}: silence false positive diagnostic on GCC 7 3f737a3 @nemequ
  • neon/ld2: apply optimizations from previous commit to other extensions 078bb00 @nemequ
  • neon ld2: gcc-12 fixes 041b1bd @mr-c
  • neon/ld2: Implement remaining functions e68f728 @Glitch18
  • neon/ld2: Wasm optimizations 3b3014f @ngzhian
  • neon/ld4_lane: Implement remaining functions 179fb79 @Glitch18
  • neon/ld4_lane: move private type usage to inside loop 0d1ab79 @nemequ
  • neon/ld4_lane: native and portable implementations a973cab @ngzhian
  • neon/ld4: use conformant array parameters 723a8a8 @nemequ
  • neon/ld4: work around spurious warning on clang < 10 64e9db0 @nemequ
  • neon/min: add SSE2 vminq_u32 implementation 2cf165e @nemequ
  • neon/min: add SSE2 vqsubq_u32 implementation 117de35 @nemequ
  • neon/{min,max}nm: add some headers for -ffast-math ebe5c7d @nemequ
  • neon/{min,max}nm: use simde_math_* prefixed min/max functions c1607d2 @nemequ
  • neon/mlal_high_n: initial implementation d6f75fa @dgazzoni
  • neon/mlal_lane: initial implementation 82e36ed 2168ca0 @nemequ
  • neon/mls: add _mm_fnmadd_* implementations of vmls*_f* 70e0c20 @nemequ
  • neon/mlsl_high_n: initial implementation ca1a4c3 @dgazzoni
  • neon/mlsl_lane: initial implementation de78ae9 @nemequ
  • neon/mls_n: initial implementation 042c6eb @nemequ
  • neon/movl: improve WASM implementation ccffc23 @nemequ
  • neon/mul: add improved SSE2 vmulq_s8 implementation c6c6361 @nemequ
  • neon/mul: implement unsigned multiplication using signed functions 979552a @nemequ
  • neon/mul_lane: Add mul_laneq functions 86b039c 5d2e4bc @Glitch18
  • neon/mull_lane: initial implementation 4dd488d @nemequ
  • neon/neg: Complete implementation of function family 6423a26 @Glitch18
  • neon/padd: Add scalar function implementations fe21dc1 @Glitch18
  • neon/pmax: Add scalar function implementations a287eaa @Glitch18
  • neon/pmin: Add scalar function implementations 38f7499 @Glitch18
  • neon/qabs: add some faster implementations 6cd925e @nemequ
  • neon/qadd: add several improved x86 and vector extension versions 4e48e5c @nemequ
  • neon/qadd: fix warning in ternarylogic call in vaddq_u32 fad2470 @nemequ
  • neon/qadd: improve SSE implementation 8fbe7cd @nemequ
  • neon/qdmulh: Add scalar function implementations 68e7a0e @Glitch18
  • neon/qdmulh: add shuffle-based implementations 8cf3afc @nemequ
  • neon/qdmulh_lane: Add remaining function implementations 1c64794 @Glitch18
  • neon/qdmulh_lane: native and portable 79dc1ee @ngzhian
  • neon/qdmulh_n: native and portable implementations 55a9c07 @ngzhian
  • neon/qdmull: add WASM implementations 7d7a43b @nemequ
  • neon/qrdmulh_lane: Add scalar function implementations 9ab1446 @Glitch18
  • neon/qrdmulh_lane: fix typo in undefs 3794620 @ngzhian
  • neon/qrdmulh_lane: initial implementation dc2ea75 @nemequ
  • neon/qrdmulh: native aliases for scalar functions should be A64 f7820fc @nemequ
  • neon/qrdmulh: steal WASM q15mulr_sat implementation for qrdmulhq_s16 ccacf94 @nemequ
  • neon/qrshrn_n: Add scalar function implementations ffa09ca @Glitch18
  • neon/qrshrn_n: native and portable implementations 2595b3e @ngzhian
  • neon/qrshrun_n: Add scalar function implementations 49300fa @Glitch18
  • neon/qrshrun_n: native and portable implementations d5e805b @ngzhian
  • neon/qshlu_n: Add scalar function implementations f7b59a5 @Glitch18
  • neon/qshlu_n: initial implementation 77af9f1 @Glitch18
  • neon/qshrn_n: Add scalar function implementations b4eed3e @Glitch18
  • neon/qshrn_n: initial implementation d9260dc @nemequ
  • neon/qshrun_n: Add scalar function implementations eeaad75 @Glitch18
  • neon/qshrun_n: native and portable implementations c29f9fb @ngzhian
  • neon/qsub: add some SSE and vector extension implementations 1cb520a @nemequ
  • neon/recpe: Add remaining function implementations 9d8e77f @Glitch18
  • neon/recpe: add some additional implementations stolen from SSE eb18b7c @nemequ
  • neon/recpe: recpe_f32 and recpe_f64, native and portable 629d129 @ngzhian
  • neon/recpe: Remove duplicate code and fix copyright year 5a27732 @ngzhian
  • neon/recps: Add scalar function implementations 9c67d34 @Glitch18
  • neon/recps: recps/recpsq for native and portable e8a8a09 @ngzhian
  • neon/recps: Use vector ops instead of relying on autovec 7e420a1 @ngzhian
  • neon/reinterpret: change defines to work with templated callers 7f9794a @ngzhian
  • neon/reinterpret: f16_u16 and u16_f16 implementations 9aedd5d @Glitch18
  • neon/rhadd: optimizations for rhaddq_xxx f730009 @aqrit
  • neon/rndi, sse2: work around several functions missing in GCC 0b6a9c1 @nemequ
  • neon/rndn: Add macro corrections d01618a @Glitch18
  • neon/rndn: Add scalar function implementation d5d6509 @Glitch18
  • neon/rndn: Fix macros to workaround bugs 90c910b @Glitch18
  • neon/rndn: work around some missing functions in GCC on armv8 050f935 @nemequ
  • neon/rshl: Add scalar function implementations c641cbd @Glitch18
  • neon/rshr_n: Add custom scalar function for utility 3a0ef81 @Glitch18
  • neon/rshr_n: Add scalar function implementations 465c1ec @Glitch18
  • neon/rshrn_n: native and portable implementations a703711 @ngzhian
  • neon/rsqrte: Implement remaining functions 75c1495 @Glitch18
  • neon/rsqrte: use vmls for fallbacks. 990b458 @nemequ
  • neon/rsqrte: vrsqrte_f32 and vrsqrteq_f32 on native and portable 8781eb6 @ngzhian
  • neon/rsqrts: Add remaining function implementations ed5e971 @Glitch18
  • neon/rsqrts: vrsqrts_f32 and vrsqrtsq_f32 native and portable de8c592 @ngzhian
  • neon/rsra_n: Add scalar function implementations 4944075 @Glitch18
  • neon/shl: Add scalar implementations 89fdad8 @Glitch18
  • neon/shll_n: native and portable implementations 98ac861 @ngzhian
  • neon/shl_n: Add scalar function implementations 267ab66 @Glitch18
  • neon/;shlu_n: faster WASM implementations 5576d8a @nemequ
  • neon/shr_n: Add scalar function implementations e3e4b8e @Glitch18
  • neon/shr_n: fix variable name in GFNI implementation of vshrq_n_s8 e751352 @nemequ
  • neon/shrn_n: s16 s32 s64 u16 u32 u64 portable and native 8810cdd @ngzhian
  • neon/shrn_n: Wasm SIMD optimizations 40b4549 @ngzhian
  • neon/sqadd: initial implementation eab9d99 @Glitch18
  • neon/sqadd: work around bug in older clang < 9 1c0dabf @nemequ
  • neon/sra_n: Add scalar function implementations 272c2cf @Glitch18
  • neon/sri_n: add 128-bit implementations aa832e1 @nemequ
  • neon/sri_n: Add scalar function implementations dcbcab5 @Glitch18
  • neon/sri_n: native and portable f6cf839 @ngzhian
  • neon/st1: Add f16 functions f58bd3c @Glitch18
  • neon/st2: Implement remaining functions 43c4b52 @Glitch18
  • neon/st2_lane: Implement remaining functions 4cbed4a @Glitch18
  • neon/st2_lane: portable and native for _{u,s}{8,16,32} 8ee1eb4 @ngzhian
  • neon/st2,st1: use zip + st1 to implement st2 7929406 @ngzhian
  • neon/st2: vst2(q) f32 s8 s16 s32 u8 u16 u32 1e38dcb @ngzhian
  • neon/st3: Add shuffle vector implementations 52da8d4 @Glitch18
  • neon/st3_lane: Implement remaining functions 982d2a9 @Glitch18
  • neon/st3_lane: portable and native *_{s,u}{8,16,32} ae308b2 @ngzhian
  • neon/st3q_u8: Wasm optimization 687460c @ngzhian
  • neon/st4_lane: Implement remaining functions 5be1b07 @Glitch18
  • neon/st4_lane: portable and native *_{s,u}{8,16,32} b231820 @ngzhian
  • neon/subhn: initial implementation ca62754 @nemequ
  • neon/sub: Implements the two remaining scalar functions 74e5b82 @Glitch18
  • neon/subl_high: initial implementation 36d6d11 @dgazzoni
  • neon/tbl: add WASM implementation of vtbl1_u8 d05fa59 @nemequ
  • neon: test for MMX/SSE instead of x86 when choosing implementation 0366dab @nemequ
  • neon/tst: implement scalar functions 41c2f8a @Glitch18
  • neon/types: remove duplicate NEON float16_t definitions 7f40f35 @dgazzoni
  • neon/types: reverse logic for SIMDE_ARM_NEON_FORCE_NATIVE_TYPES 7776a8c @nemequ
  • neon/types: use vector extensions for public types when available 790e263 @nemequ
  • neon/vdup: vdupq_lane_f32 native and portable e2ae5dc @ngzhian
  • neon/vld1q_dup: native and portable implementations 650d531 @ngzhian
  • neon/vld2_u8: native and portable implementation 85d2ed2 @ngzhian
  • neon/vld2: vld2_{u16,u32} and vld2q_{u8,u16,u32,f32} b43d434 @ngzhian
  • neon/vld4: Wasm optimization of vld4q_u8 07387bf @ngzhian
  • neon/vmovq: define vmovq_n as aliases for vdup_n ff7472b @ngzhian
  • neon/xar: initial implementation 50cd8af @Glitch18
  • neon/zip1: add armv7 implementations d4ded0a @nemequ

SVE Intrinsics

  • Initial import of a portable SVE implementation. f8f8382 @nemequ
  • sve/ptest: simplify svptest_first c7e4699 @nemequ
  • sve/whilelt: small optimizations for all whilelt functions 2b29fef @nemequ
  • sve: add native aliases for overloads 9fd7d68 @nemequ
  • sve/add: add svadd_n_* functions 747e076 @nemequ
  • sve/add: switch to using svsel for implementations of _z/_m variants 971aefb @nemequ
  • sve/whilelt: add svwhilelt_*_{u32,s64,u64} implementations 36927be @nemequ
  • sve/and: switch implementations to use svsel 3382f4e @nemequ
  • sve/sel: initial implementation 113ec2b @nemequ
  • sve/types: add mmask4 functions for 256-bit vectors 33fbaa2 @nemequ
  • sve: some tweaks to get s390x working 7311dd3 @nemequ
  • sve/and: initial implementation 5c56617 @nemequ
  • sve/dup: add *_m variants b90ae4d @nemequ
  • sve/dup: switch implementations to use svsel 1da79a2 @nemequ
  • sve/dup: rename from dup_n bad00e9 @nemequ
  • sve/qadd: initial implementation 8aaa62b @nemequ
  • sve/sel: add cast to make GCC on s/390x happy a1e423e @nemequ
  • sve/add: switch some _x implementations to use _x instead of _z dd42b49 @nemequ
  • sve/add: clean up some minor codegen issues in the tests 21b39aa @nemequ
  • sve/add: initial implementation 70d5b0a @nemequ
  • sve/cmplt: replace vec_and with & for s390 implementations 7c599ea @nemequ

WASM intrinsics

  • Add WebAssembly SIMD128 implementation. db758eb @nemequ
  • README: include WASM SIMD128 in list of completed extensions 20664a6 @nemequ
  • features: don't define wasm_unimplemented_simd128 57efb02 @nemequ
  • Update WASM SIMD intrinsics to match new names. 20682c1 @nemequ
  • simd128: add clang implementation of wasm_f64x2_promote_low_f32x4 804b833 @nemequ
  • simd128: cast to int ptrs instead of void* in wasm_v128_load*_lane 65db4cf @nemequ
  • simd128: add some implementations of convert functions bdc8698 @nemequ
  • simd128: implement remaining functions 271d1e4 @nemequ
  • simd128: don't call 64-bit only functions on 32-bit targets 631cf53 @nemequ
  • simd128: fix native aliases 7078ab4 @nemequ
  • simd128: clean up some -Wvector-conversion warnings 5c8d7b3 @nemequ
  • simd128: lots of NEON implementations 0e43903 @nemequ
  • simd128: add more implementations of splat and bitselect functions c734535 @nemequ
  • simd128: add additional cast in wasm_i32x4_abs 34b775d @nemequ
  • simd128: add movemask-based implementations of any/all_true functions 22609d4 @nemequ
  • simd128: add missing WASM SIMD128 functions f4ee32a @nemequ
  • simd128: add simde_wasm_i64x2_ne 2380aa4 @coderzh
  • Add NEON, SSE3, and AltiVec implementations of wasm_i8x16_swizzle 516eb02 @nemequ
  • simd128: add vec_abs implementation of wasm_i8x16_abs 1d4075c @nemequ
  • simd128: work around clang bugs 50893 and 50901. f73db2d @nemequ
  • simd128: remove stray && 0 c66df66 @nemequ
  • simd128: add optimized f32x4.floor implementations c2fda16 @nemequ
  • simd128: add some Arm implementations of all_true 06b3462 @nemequ
  • simd128: any_true implementations for Arm d45f735 @nemequ
  • simd128: add improved add_sat implementations b7b69fb @nemequ
  • wasm128, sse2: disable -Wvector-conversion when calling vgetq_lane_s64 679b970 @nemequ
  • simd128: add x86/Arm/POWER implementations 8a748d7 @nemequ
  • simd128: fix portable fallback for wasm_i8x16_swizzle 6c57794 @nemequ
  • simd128: work around bad diagnostic from clang < 7 e60f1e0 @nemequ
  • simd128: move tests from wasm/ to wasm/simd128/ c37dfd3 @nemequ
  • simd128: add several some AArch64 and Altivec trunc_sat implementations fdfa16a @nemequ
  • Fix several places where we assumed NEON used vector extensions. c4aa8b4 @nemequ
  • simd128: add more pmin/pmax implementations 96226ff @nemequ
  • simd128: add SSE2 q15mulr_sat implementation 732f519 @nemequ
  • simd128: add improved min implementations on several architectures 2890ad4 @nemequ
  • simd128: add fast max/pmax implementations 706de03 @nemequ
  • simd128: add NEON, Altivec, & vector extension sub_sat implementations fca719e @nemequ
  • simd128, sse2: more cvtpd_ps/f32x4_demote_f64x2_zero implementations 5638afa @nemequ
  • simd128, sse2: add more madd_epi16 / i32x4_dot_i16x8 implementations d013847 @nemequ
  • simd128: vector extension implementation of floating-point abs 3d4b2ff @nemequ
  • simd128, neon/neg: add VSX implementations of abs and neg functions 783c752 @nemequ
  • simd128: use vec_cmpgt instead of vec_cmplt in pmin 3378ab3 @nemequ
  • simd128: add fast ceil implementations 42f0a0b @nemequ
  • simd128: improve many lt and gt implementation e8da237 @nemequ
  • simd128: add fast sqrt implementations 22c0dee @nemequ
  • simd128: add fast extmul_low/high implementations d9e3615 @nemequ
  • simd128: add NEON and POWER shift implementations 9848a4c @nemequ
  • simd128: add fast promote/demote implementations 8a21137 @nemequ
  • simd128: add dedicated functions for unsigned extract_lane 5b1a330 @nemequ
  • simd128: add fast narrow implementations dbd2e5c @nemequ
  • simd128: add fast implementations of extend_low/extend_high 09d8f79 @nemequ
  • wasm: load lane memcpy instead of cast to address UBSAN issues 7631312 @wrv
  • wasm: f32x4 and f64x2 nearest roundeven dc75f4c @wrv
  • deal with WASM SIMD128 API changes. e1bc968 @nemequ
  • relaxed-simd: initial support for the WASM relaxed SIMD proposal 083bd2f @nemequ
  • relaxed-simd: add trunc functions 3e5515a @nemequ
  • relaxed-simd: add blend functions bf136e7 @nemequ
  • relaxed-simd: add fms functions 48954b6 @nemequ
  • relaxed-simd: add fma functions 9715924 @nemequ

x86 intrinsics

  • Fix native aliases for amd64-only functions f0e9755 @nemequ
  • Add @aqrit's SSE2 min/max implementations d90e835 @nemeq
  • x86: fix AVX native → SSE4.2 native f6fc25a @mr-c
  • x86: ignore warnings about inefficient functions on lcc 416c243 @makise-homura
  • The fix for GCC bug #95483 wasn't in a release until 11.2 11d95f8 @nemequ
  • fix array size wrong size (caught by GCC 12) c6179cb @Lithrein

MMX

SSE*

  • sse2: workaround missing vcvtnq_s32_f32 on GCC e11258e @jpcima
  • sse2: fix incompatible argument in A32 impl. of _mm_cvtps_epi32 b5fbe39 @jpcima
  • ssse3: Add SSE2 integer abs implementation 2de8624 @aqrit
  • sse4.2: re-enable native _mm_cmpgt_epi64 7117c48 @aqrit
  • sse2: remove AArch64 implementation of _mm_movemask_epi8 c595f6b @nemequ
  • sse4.1: fix AArch64 implementation of simde_x_mm_blendv_epi64 978d1f7 @milot-mirdita
  • sse2: ignore broken _mm_loadu_si{16,32} on GCC 4b7394f @nemequ
  • sse2: use simde_math_{add,sub}s_* for mm{add,sub}s_* functions 09d725d @nemequ
  • sse4.1: _mm_blendv_epi8: add sse2 and update wasm_simd128 implementions 2dbc124 @aqrit
  • sse4.1: add some casts to make clang -Weverything happy 5f000af @nemequ
  • sse2: vcvtnq_s32_f32 is armv8-specific 98075d0 @nemequ
  • sse2: don't require constants for _mm_srai_epi{16,32} 8bee92a @????
  • sse2: add fast-math WASM implementation of _mm_cvtps_epi32 24c503f @nemequ
  • sse2: prefer shuffle implementation of _mm_shuffle_epi32 to NEON d2ce706 @nemequ
  • sse2: correct typos in simde_x_mm_broadcastlow_pd f8ce9bb @rosbif
  • sse: prefer SIMDE_SHUFFLE_VECTOR implementation of _mm_shuffle_ps 377e350 @nemequ
  • sse: don't use armv7 impl of _MM_TRANSPOSE4_PS on armv8 b5fb757 @nemequ
  • sse, sse2: work around GCC bug #100927 80472b7 @nemequ
  • sse2: fix set but not used variable in _mm_cvtps_epi32 f460666 @nemequ
  • sse, sse2: sync clang-12 changes for vec_cpsgn 1ba1596 @simba611
  • sse, sse2: fix vec_cpsign order test 1465c48 @nemequ
  • sse, sse2: clean up several shuffle macros cc6dc18 @nemequ
  • sse2: add parenthesis around macro arguments b394520 @nemequ
  • sse2: remove statement expr requirement for NEON srli/srai macros da4d24f @nemequ
  • sse4.1: replace NEON implementations with shuffle-based implementations 29a3cb4 @nemequ
  • sse4.1: remove statement expr dependency in blend functions 01fb894 @nemequ
  • sse4.1: use NEON types instead of vector in insert implementations 489e36c @nemequ
  • sse2, sse4.1: pull in improved packs/packus implementations from WASM 7b1df61 @nemequ
  • sse: replace _mm_prefetch implementation 26d515f @nemequ
  • sse: use portable implementation to work around llvm bug #344589 79738de @nemequ
  • sse4.2: work around more warnings on old clang 3f186a0 @nemequ
  • sse: avoid including windows.h when possible 750f20d @boris-kuz
  • _mm_insert_ps: incorrect handling of the control 94e7569 @????
  • fix A32V7 version of _mm_test{nz,}c_si128 e7c70a2 @mr-c
  • sse, mmx: fix clang-11 on POWER a0e9f9f @nemequ

AVX

  • avx: work around missing _mm256_{load,store}u_m128{,i,d} on LCC a3a39e2 @nemequ
  • avx: try to detect prior inclusion of AVX header and handle it e8b7a2e @nemequ
  • avx, avx512/cmp: properly handle NaN in _mm{,256,512}cmp{ps,pd,ss,sd} 491d3fa @nemequ
  • avx: use internal symbols in clang fallbacks for cmp_ps/pd functions 35b86b7 @nemequ
  • avx: work around incorrect maskload/store definitions on clang < 3.8 a9313de @nemequ
  • avx: add native calls for mm256_insertf128{pd,ps,si256} bab30bb @LaurentThomas

AVX2

  • avx2: add vector/shuffle implementation of _mm256_madd_epi16 2c2dd73 @nemequ
  • avx2: fix undefs for many native aliases 2ca5480 @anrodrig
  • avx2: added vector size conditional for unpack 287bda9 @simba611
  • avx2: separate natural vector length for float, int, and double types 6d1896d @nemequ

AVX512

  • avx512/abs: add SSE2 implementation of _mm_abs_epi64 5c2f423 @aqrit
  • avx512/madd: fix arguments for native aliases ae545ce @nemequ
  • avx512/madd: explicitly promote 16-bit elements to 32-bit e5dd146 @nemequ
  • avx512: work around several bugs in older versions of clang e64231e @nemequ
  • axv512/or: implement _mm512_mask_or_pd function b7933e6 @????
  • avx512/insert: implement inserti{,_mask,maskz}{32x8,64x2} 8e306d1 @simba611
  • avx512/or, avx512/xor: regenerate tests using 32-bit ints instead of 64 e1de51d @nemequ
  • avx512/insert: implement mm512{_mask,_maskz}_insert{f32x8,64x2} 2c8b052 @????
  • avx512/xor: implement mm512_mask(z)_xor_pd/s functions 854f913 @????
  • avx512/or: implement mm512_mask(z)_or_ps/d functions 6cda738 @????
  • avx512/mullo: implement mm512_mullo_epi64 with mask(z) 8545d26 @????
  • avx512:compress: implement mm256_mask(z)_compress(storeu)_p* a7386b5 @simba611
  • avx512/insert: convert macros to functions, regenerate old-style tests 0ba2085 @nemequ
  • avx512/fmsub: implement fmsub functions for AXV512VL b7df811 @simba611
  • avx512/compress: implement _mm256_mask_compress_pd d1223d4 @simba611
  • avx512/cmpeq: implement _mm512_mask_cmpeq_epi8_mask 88d2faf @nemequ
  • avx512/cmpneq: initial implementation of 128-bit and 256-bit functions 34194f2 @nemequ
  • avx512/abs: work around buggy pd functions in GCC 7 - 8.2 605c92a @anrodrig
  • avx512: implement mm*_mask(z)compress(storeu)* dab908e @simba611
  • avx512: add tests for previous commit (104a99bc) b3535c3 @nemequ
  • avx512: add several new functions ccc0757 @anrodrig
  • avx512/cvtt: add simde_mm{_mask,_maskz}_cvttpd_epi64 d2f518a @nemequ
  • avx512/cvt: add simde_mm{_mask,_maskz}_cvtepi64_pd 292e1e2 @nemequ
  • avx512/unpacklo: added vector size conditional 3924339 @simba611
  • avx512/unpacklo: implement mm512_unpacklo_* functions 8582277 @simba611
  • avx512/unpacklo: implement mask variants of unpacklo 0c4775e @simba611
  • avx512/unpack{hi,lo}: implement mm256_mask(z)_unpack* functions ca8c102 @simba611
  • avx512/unpack{hi,lo}: implement mask variants of unpacklo b2c176f @simba611
  • avx512/range: remove CONSTIFY macro usage 8bc81ca @nemequ
  • avx512/range: implement mm512_range_ps/d functions d59e3f5 @simba611
  • avx512: implement mm_mask(z)_unpack* funcs 7aa3155 @simba611
  • avx512/range: fix variable names in macro implementations 8ccb363 @nemequ
  • avx512/range: implement mm(256, 512)_mask(z)_range_p* 8bf0305 @simba611
  • avx512/roundscale: initial implementation e47e703 @simba611
  • avx512/round, avx512/roundscale: add shorter vector fallbacks b542b01 @simba611
  • avx512/roundscale: implement simde_mm{256,512}_roundscale_ps 6ddf1a2 @simba611
  • avx512/range: don't used masked comparisons for 128/256-bit versions b8e63b4 @nemequ
  • avx512/range: fix fallback macros 6b8d8b8 @nemequ
  • avx512/roundscale_round: implement remaining functions db7a52a @simba611
  • avx512/range_rounnd,round: move range_round functions out of round d382488 @simba611
  • avx512/cmp{g,l}e: AVX-512 implementations of non-mask functions ca1812d @nemequ
  • avx512/cmple: finish implementations of all cmple functions 06aa828 @nemequ
  • avx512/cmpge: fix bad _mm512_cmpge_epi64_mask implementation 0b5de15 @nemequ
  • avx512/cmpge: finish implementing all functions 9a4d0de @nemequ
  • avx512/range: implement mm{,512}{,_mask,_maskz}_range_round* 37ab069 @simba611
  • avx512/rorv: initial implementation of _mm_rorv_epi32 1fa7764 @simba611
  • avx512/scalef: implement remaining functions 482bf32 @simba611
  • avx512/conflict: implements mm_conflict_epi32 c8f2755 @simba611
  • avx512/scalef: initial implementation 581bf31 @simba611
  • avx512/rol: implement remaining functions 9a52011 @simba611
  • avx512/rolv: initial implementation a2e7632 @simba611
  • avx512: initial implementation f35090a @simba611
  • avx512/ternarylogic: initial implementation 30eb81e @nemequ
  • avx512/conflict: implement missing functions b6887ce @simba611
  • avx512/multishift: initial implementation 6b125ec @simba611
  • avx512/dbsad: initial implementation d659f42 @simba611
  • avx512/dpbusd: initial implementation 913a0a4 @simba611
  • avx512/ternarylogic: implement remaining functions 7faedd6 @simba611
  • avx512/shldv: initial implementation cddc500 @simba611
  • avx512/popcnt: initial implementation d5ec32a @simba611
  • avx512/dbsad: add vector extension impl. and improve scalar version 0c76c5e @simba611
  • avx512/cvtt: _mm_cvttpd_epi64 is only available on x86_64 e842f29 @nemequ
  • avx512/shldv: limit shuffle-based version to little endian 9b08cfc @nemequ
  • avx512/popcnt: implement remaining functions b17b646 @simba611
  • avx512/dpbf16: initial implementation 18b4e74 @simba611
  • avx512/4dpwssd: implement complete function family 5bbf50f @simba611
  • avx512/dpwssd: initial implementation 973df0e @simba611
  • avx512/bitshuffle: initial implementation c92a13b @simba611
  • avx512/dpbusd: implement remaining functions ff0d35a @simba611
  • avx512/set, avx512/popcnt: use _mm512_set_epi8 only when available aa5746f @nemequ
  • avx512/roundscale: don't assume IEEE 754 storage 98e6a60 @simba611
  • avx512/4dpwssds: initial implementation 22b8b97 @simba611
  • avx512/dpbf16: implement remaining functions 0ec8d72 @simba611
  • avx512/dpwssds: initial implementation fe93582 @simba611
  • avx512/cvt: add _mm512_cvtepu32_ps 93d3619 @nemequ
  • avx512/dpbusds: complete function family 34f2488 @simba611
  • avx512/fixupimm: initial implementation 441339e @simba611
  • avx512/permutex2var: work around incorrect definition on old clang 647279d @nemequ
  • avx512/scalef: _mm_mask_scalef_round_ss is still missing in GCC 22be4e8 @nemeq
  • avx512/scalef: work around for GCC bug #101614 f60c159 @nemequ
  • avx512/permutex2var: hard-code types in casts instead of using typeof 8893116 @nemequ
  • avx512/load_pd: initial implementation 8445684 @operasfantom
  • avx512/load_ps: initial implementation d588049 @operasfantom
  • avx512/setzero: fix native aliases c900d5e @????
  • avx512/rorv: implement _mm{256,512}{,_mask,_maskz}_rorv_epi{32,64} b1745c5 @simba611
  • simde/scalef: add scalef_ss/sd d9898e5 @simba611
  • Properly map __mm functions to __simde_mm 96c963f @psaab

GFNI

  • gfni: improve ARM NEON implementation a99a3ec @rosbif
  • gfni: add ARM, PPC and WASM implementations of gf2p8mul intrinsics 61126b3 @rosbif
  • gfni: add cast to work around -Wimplicit-int-conversion warning d066a1c @nemequ
  • gfni: remove unintentional dependency on vector extensions bdfa828 @nemequ
  • gfni: work around clang bug #50932 7d4beba @nemequ
  • gfni: work around error with vec_bperm on clang-10 on POWER 8620bd0 @nemequ
  • gfni: replace vec_and and vec_xor with & and ^ on z/arch f5577dc @nemequ
  • gfni: add many x86, ARM, z/Arch, PPC and WASM implementations 97eb961 @rosbif

XOP

  • xop: fix NEON implementation of maccs functions to use NEON types 6ecc0e3 @nemequ

F16C

  • f16c: initial implementation 62c1087 @nemequ
  • f16c: use __ARM_FEATURE_FP16_VECTOR_ARITHMETIC to detect Arm support eaeac09 @nemequ

FMA

  • fma: work around broken implementations of some functions on MCST LCC 269db2a @makise-homura
  • fma: add mls-based NEON implementations of fnmadd functions 55416aa @nemequ
  • fma: drop weird high-priority implementation in _mm_fmadd_ps 20922ff @nemequ
  • fma: use fma/fms instead of mla/mls on NEON 2fe84e5 @nemequ
  • fma: use NEON types in simde_mm_fnmadd_ps NEON implementation 44d38bd @nemequ
  • fma: fix return value of simde_mm_fnmadd_ps on NEON 87198d9 @nemequ
  • Fixed FMA detection macro on msvc 286ba3d @dhbloo

SVML

  • svml: trivial indentation fix 2176652 @nemequ
  • svml: remove some dead stores from cdfnorminv 11d97ba @nemequ

MIPS MSA intrinics

  • Begin working on implementing MIPS MSA. e9c002a @nemequ
  • msa/add_a: initial implementation 6b37bb3 @nemequ
  • msa/addvi: initial implementation 8711327 @nemequ
  • msa/subv: initial implementation 75b3b73 @nemequ
  • msa/andi: initial implementation 31b7ce7 @nemequ
  • msa/and: initial implementation 6635520 @nemequ
  • msa/adds: initial implementation c37559c @nemequ
  • msa/adds_a: initial implementation bb84c44 @nemequ
  • msa/madd: initial implementation 1b89ab3 @nemequ
  • Many work-arounds for GCC with MSA, and support in the docker image. e5dbb93 @nemequ

Arch support

  • various: correct PPC and z/Arch versions plus typo ac8d722 @rosbif
  • arch: __ARM_ARCH now (v8.1+) encodes the minor version b0b22d1 @nemequ
  • arch: set SIMDE_ARCH_ARM for AArch64 on MSVC 1d8befc @nemequ

z/Arch

  • Correctly detect and handle z/Arch and its vector extensions 4a3f466 @nemequ
  • Fix z/Arch without zvector. b8af226 @nemequ
  • sse, sse2: add several z/Arch implementations 4f628ac @nemequ
  • sse2, sse4.1: additional z/Arch implementations for ksw2 ee24439 @milot-mirdita
  • Many additional z/Architecture implementations of x86 functions 5a2b035 @nemequ
  • se2, sse4.1: additional z/Arch implementations for ksw2 ee24439 @milot-mirdita
  • sse4.1, neon/bsl: v/Arch implementations of blendv/bsl functions 80a8484 @nemequ
  • z/Architecture implementations for remaining min/max functions 694d547 @nemequ
  • neon/cvt: z/Arch implementations 107fab8 @nemequ
  • sse, sse4.1: z/Arch implementations of some rounding functions 9fb1509 @nemequ
  • sse, sse2, neon/dup_n: lots of z/Arch splat-based implementations 874d51f @nemequ
  • gfni: add z/Arch version c12f111 @rosbif
  • x86,arm/neon: Correct z/Arch versions 50fba9b @rosbif
  • features: add z/arch to SIMDE_NATURAL_VECTOR_SIZE d41999b @nemequ

Altivec

  • sse, sse2: generate to/from altivec functions for SSE/SSE2 types. dd3ff53 @nemequ
  • docker: power9-clang ignore deprecated-altivec-src-compat warnings b70f1a2 @mr-c
  • sse4.1: PPC AltiVec has no vec_splat_s64 debbf73 @rosbif
  • arch: fix SIMDE_ARCH_POWER_ALTIVEC_CHECK to include AltiVec check 8534e64 @nemequ
  • simd128: add AltiVec implementations of any/all_true a3b2630 @nemequ

e2k (Elbrus

  • e2k: Introduce E2K (Elbrus) architecture 093b2c5 @makise-homura
  • e2k, ppc: Make shifts unsigned 24ddeba @makise-homura

Testing with Docker/Podman & CI

  • gh-actions: add some bionic-era GCC builds ccdd24b @nemequ
  • gh-actions: add several clang builds e4b4646 @nemequ
  • drone: read testlog.txt if tests fail eb71d89 @nemequ
  • docker: add -march=z14 -mzvector to s390x-gcc-10 build. 8f60406 @nemeq
  • docker: use z13 instead of z14 for s390x architecture a524be2 @nemequ
  • docker: install meson from pip df63f88 @nemequ
  • docker: use meson 0.55.0 instead of 0.54.0. 5112bf2 @nemequ
  • docker: add platform dependent fixes for docker 3dd58b9 @Glitch18
  • docker: fix script exiting bug 6770ec0 @Glitch18
  • gh-actions: add some bionic-era GCC builds. ccdd24b @nemequ
  • Remove Travis CI. 17a27e7 @nemequ
  • gh-actions: temporarily disable emscripten build 71ea291 @nemequ
  • codeql: analyze the merge commit d3a40e1 @mr-c
  • gh-actions: automatically detect whether to use SDE bb69b54 @nemequ
  • download-sde: be more tolerant of changes on Intel's web site 87bb927 @nemequ
  • meson: require meson version 0.54 349da2b @makise-homura
  • testing: Require exact matches for abs functions 9085d94 @jpcima
  • test: replace 1e-##precision with to_slop functions 9adcc21 @nemequ
  • test: allow passing INT_MAX for precision for exact comparisons e903b7f @nemequ
  • docker: only rebuild image if older than a week d9b1322 @nemequ
  • docker: fix build when the image doesn't exist yet ab3b509 @nemequ
  • drone: configure apt to retry failed downloads 1c442b4 @nemequ
  • gh-actions: disable clang-3.9 build 7fcb64d @nemequ
  • docker: skip date check when building image for the first time a1c4728 @Glitch18
  • docker: allow overriding the BUILD_IMAGE setting ca6f690 @nemequ
  • gh-actions: use ctest to run CMake tests so we can output on failure 03f6ebe @nemequ
  • cirrus: add -Db_lundef=false to sanitizer buld 5a0fc02 @nemequ
  • gh-actions: try commit message witohut quotes on implementation-status 3f81cac @nemequ
  • gh-actions: add action to update the implementation-status repo 333f077 @nemequ
  • codecov: ignore test/ directory 65e7903 @nemequ
  • docker: Add a prompt before rebuilding image c2cff9f @Glitch18
  • docker: Fix BUILD_IMAGE always being set to 'y' 368a777 @Glitch18
  • travis: use -march=native and GCC on s390x 5b9b2af @nemequ
  • gh-actions: use -O2 instead of -O3 on emscripten 636f145 @nemequ
  • cmake: generate most declare-suites.h files 5d62f0d @nemequ
  • Add Windows ARM64 CI f12fd00 @tommyvct
  • gh-actions: only run mSVC Arm checks on msvc-arm branch 3d8a516 @nemequ
  • docker: use -O2 instead of -O3 on emscripten 3173499 @nemequ
  • gh-actions: switch emscripten build to Meson bde2cb1 @nemequ
  • ga: ubuntu-16.04 has been retired, migrate to ubuntu-18.04 6d0c65c @mr-c
  • ga: pin to macos-10.15 instead of -latest d64de8c @mr-c
  • docker: fix quoting error 830981b @mr-c
  • Azure: publish test results 51c24d8 @mr-c
  • tests: update download-iig.sh to account for Intel changes 2fdc9a5 @nemequ
  • test: fix download script for SDE b3b4975 @nemequ
  • Travis CI power9: try using all the cores to speed up b91516f @mr-c
  • CI: trim flags for icx/icpc 201dcdb @mr-c
  • CI: debian testing gcc: -Wno-error=stringop-overread af24d0c @mr-c
  • emscripten: turn off clang's -Wunsafe-buffer-usage for the tests 3caf71d @mr-c
  • update SDE download link 24338a2 @mr-c
  • CI: test using Intel® oneAPI DPC++/C++ Compiler instead of ICC df144ff @mr-c
  • update deps/images for CI 1cf39df @mr-c
  • GitHub Actions: Ubuntu 22.04 + system meson dd0b662 @mr-c
  • docker: aarch64-clang ; match drone.io flags bbe4416 @mr-c
  • docker: skip mips64el from cross-building d3f5fae @mr-c
  • Docker: tighten libstdc++NN-dev package selection c44539c @mr-c
  • docker: pass -future flag to sde for i686-all-gcc-9 d8658ea @mr-c
  • docker: icc, disable depracation notice 505f24a @mr-c
  • docker: add Intel ICX testing 4a4eeb6 @mr-c
  • docker: add more cross building profiles for modern compilers 89e2c5b @mr-c
  • docker: qemu package doesn't exist & is unneeded 9ec8375 @mr-c
  • CI: fix longsoon build on CircleCI 3db6d7a @mr-c
  • meson docs: don't use deprecated syntax 1a1a6eb @mr-c
  • CI: Update codecov to v3 for Node 16 support bd7f8df @wrv
  • CI: Update macos build to 11 c30a29b @wrv
  • CI: Comment out Ubuntu 18.04 build as will be unsupported in April 2023 6cefe47 @wrv
  • CI: Update to actions/checkout@v3 to avoid Node 12 warning 511b5b7 @wrv
  • SDE: add -future flag to support all x86 features caa3c6d @wrv
  • CI: add -fp-model precise for icx/icpx 7ec32ff @wrv
  • CI: update OSSAR action versions a1a63ac @wrv
  • CI: cancel GitHub Actions if there is a newer commit 8c56459 @mr-c
  • CI: GitHub Actions: test with gcc-12 f6db95d @mr-c
  • docker: enable use of ccache 4d42b90 @mr-c
  • docker: icx ignore no-tautological-constant-compare warning 97315b8 @mr-c
  • docker: add test with Debian default flags, also for armel 0a44b50 @mr-c
  • docker:sde tigerlake allows for advanced AVX512 testing 54b5d4e @mr-c
  • netlify: build amalgamated SVE header 41898ab @nemequ
  • travis: bring back some Travis builds 0ec9926 @nemequ
  • gh-actions: remove GCC 4.7 build 3997b8f @nemequ
  • docker: apt-get update before each other apt command 5560ca0 @nemequ
  • github-actions: add action to push to the simde-no-tests repository 1b4647f @milot-mirdita
  • gh-actions: move push-to-no-tests.yml into the right directory. 7fbb9c9 @nemequ
  • check-flags.sh: add lock around installing SDE 373e1e3 @nemequ
  • docker: add a bunch of cross files b718597 @nemequ
  • gh-actions: give up on getting commit ID in message for status repo 05ecb5d @nemequ
  • netlify: deploy wasm/simd128.h aa29a8b @nemequ
  • docker/Dockerfile: Use netselect-apt to speed up image build e98cf70 @Glitch18
  • gh-actions: add missing jobs property ddd453a @nemequ
  • download-iig: tweak script to fix download location 082a875 @nemequ
  • gh-actions, docker: add -fno-lax-vector-conversions to clang flags ccdfca9 @nemequ
  • sde: don't print URL in download-sde script. 55fc0e2 @nemequ
  • gh-actions: add -ffast-math builds for GCC and clang de616e7 @nemequ
  • Default to -DSIMDE_CONSTRAINED_COMPILATION when building tests 3d14f8e @nemequ
  • docker emscripten: remove experimental wasm flag for v8 496d88d @wrv

Misc

  • Improve abs function performance on SSE/SSE2 093f6ee @jpcima
  • Upgrade Hedley to v15 0d070e1 @nemequ
  • detect-clang: fix version numbers for clang < 4.0 8a2c645 @nemequ
  • align: add MCST LCC to compilers known to support __alignof__ 38e3840 @nemequ
  • common: add an MCST LCC check for vector features. e38fe50 @nemequ
  • complex: fix checks for GCC C complex math support ad8c7e0 @nemequ
  • Fix SIMDe link in no-tests README 21f7a2a @maxbachmann
  • common: enable OpenMP by default on LCC ff34d1b @nemequ
  • README: more thoroughly document OpenMP support 46c65e1 @nemequ
  • Add some files to .gitignore 8381a57 @nemequ
  • check-flags.sh: move download location from ~ to /opt/intel a361527 @nemequ
  • simde-features: fix C&P error 00fd88d @rosbif
  • {neon,simd128,avx512/abs}: provide vector versions of i64 abs d3976e0 @nemequ
  • common: improve check for C11 generic selections 11d2a6d @nemequ
  • common: don't use aligned OpenMP clause on MCST LCC a9a5a0d @nemequ
  • math: use simde_math_-prefixed abs/labs/llabs 813f4f0 @nemequ
  • diagnostic: silence -Wreserved-identifier warning from LLVM 0b6f5b2 @nemequ
  • Fix compilation with clang on POWER 5c43ac0 @nemequ
  • Work around issues preventing compilation on NVCC 3815c04 @nemequ
  • Don't set SIMDE_NO_CHECK_IMMEDIATE_CONSTANT in tests. 0c9fe4c @nemequ
  • common: move conversion functions for u32 <-> f32 into common 37e187c @nemequ
  • Add SIMDE_FAST_EXCEPTIONS option d01d58e @nemequ
  • Use SIMDE_HUGE_FUNCTION_ATTRIBUTES on several functions. 552c202 @nemequ
  • Add -s ENVIRONMENT=shell to emscripten flags 69d7655 @nemequ
  • Fix an assortment of small bugs 8b5d68c @simba611
  • Remove all && 0s in preprocessor macros. b6f21a9 @nemequ
  • Add constrained compilation mode a992f5b @simba611
  • Fix gcc-10 compilation on s/390x a10f12e @nemequ
  • simde-diagnostic: Include simde-arch 61cd8aa @Glitch18
  • Add many fast floating point to integer conversion functions 1fbe712 @nemequ
  • common: Use AArch64 intrinsics if _M_ARM64EC is defined 2a9e7b7 @tommyvct
  • Add -Wdeclaration-after-statement to the list of ignored warnings. bba815d @nemequ
  • Work around compound literal warning with clang 90523a2 @dgazzoni
  • Various fixes for -fno-lax-vector-conversions 39d902e @nemequ
  • Fix warnings with -fno-lax-vector-conversions e5ff228 @ngzhian
  • Improve widening pairwise addition implementations 3b950bb @nemequ
  • Wrap static assertions in code to disable -Wreserved-identifier d1fc7b5 @nemequ
  • Add missing static const in simde-math.h. NFC 6bd6562 @sbc100
Template for next time

# Summary

# Details

## Implementation of NEON intrinsics:

## SVML

## x86 intrinsics

### MMX

### SSE*

### AVX

### AVX2

### AVX512

### GFNI 

### XOP

### F16C

## Testing with Docker/Podman & CI

## Misc
Clone this wiki locally