Skip to content

Conversation

@jdemel
Copy link
Contributor

@jdemel jdemel commented Jul 9, 2024

This commit adds code to support SVE+SVE2. However, since I don't have any real hardware available, it is mostly guesswork.

@marcusmueller
Copy link
Member

Is it really sensible to merge this then?
No support would seem to be better than broken support; or am I wrong about that?

@jdemel jdemel marked this pull request as draft July 11, 2024 07:26
@jdemel
Copy link
Contributor Author

jdemel commented Jul 11, 2024

I wanted to look into support for these extensions because I expect them to be available in all ARMv9 CPUs.
I converted the PR to draft. Maybe someone wants to pick up the draft and is able to test it. This might be a good start. It's also the reason why I shared the code in this state.

This might be simple, this might be tough. Let's go step-by-step.

Signed-off-by: Johannes Demel <[email protected]>
@wjsota
Copy link

wjsota commented Sep 20, 2025

I have a SVE ARM server available. Just did a simple testing on @jdemel 's branch as of commit 63ca709.

This is the build message.

-- Available machines: generic;neon;neonv8;sve;sve2
-- BUILD TYPE = RELEASE
-- Base cflags = -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign
-- BUILD INFO ::: generic ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign
-- BUILD INFO ::: neon ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations
-- BUILD INFO ::: neonv8 ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations -funsafe-math-optimizations
-- BUILD INFO ::: sve ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations -funsafe-math-optimizations -march=armv8-a+sve
-- BUILD INFO ::: sve2 ::: GNU ::: -O3 -DNDEBUG  -fcx-limited-range -Wall -Werror=incompatible-pointer-types -Werror=pointer-sign -funsafe-math-optimizations -funsafe-math-optimizations -march=armv8-a+sve -march=armv8-a+sve2

The compiler did some autovectorization. I observed some SVE instructions in /build/lib/libvolk.so.3.1.2. Some snippets I observed are:

   e2280:	d37ff862 	lsl	x2, x3, #1
   e2284:	a422c1c0 	ld2b	{z0.b-z1.b}, p0/z, [x14, x2]
   e2288:	e40341e0 	st1b	{z0.b}, p0, [x15, x3]
   e228c:	0430e3e3 	incb	x3
   e2290:	25260c60 	whilelo	p0.b, w3, w6
   e2294:	54ffff61 	b.ne	e2280 <volk_32f_8u_polarbutterflypuppet_32f_generic+0x1a90>
   b9b40:       6594a800        scvtf   z0.s, p2/m, z0.s
   b9b44:       65b50482        fmla    z2.s, p1/m, z4.s, z21.s
   b9b48:       65b20001        fmla    z1.s, p0/m, z0.s, z18.s
   b9b4c:       65b30002        fmla    z2.s, p0/m, z0.s, z19.s
   b9b50:       8b070042        add     x2, x2, x7
   b9b54:       25631ca0        whilelo p0.h, x5, x3
   b9b58:       54fffe01        b.ne    b9b18 <volk_16i_32fc_dot_prod_32fc_generic+0x318>

make test suggests that 100% tests passed, 0 tests failed out of 148

@wjsota
Copy link

wjsota commented Sep 21, 2025

I have access to an ARM server with SVE support (AWS Graviton3), and have been on the search for a suitable project to do for this ARM Developer lab project.

I would like to try to take a stab at adding support for SVE.. Im looking to use this to learn both about ARM SVE and performance engineering.

Given that it is SIMD just like ARM NEON, I can use reference commit 789fb4d and this paper written by Nathan West to see how I can add support for SVE. There's also a learning path by ARM on how to port ARM NEON to SVE.

May I have your support to try? As Im somewhat inexperienced, I would like feedback and guidance from you along the way, but I believe I can work mostly independently.

@jdemel
Copy link
Contributor Author

jdemel commented Dec 24, 2025

@wjsota your comments slipped through. Thank you very much.
If you're still interested, I'd suggest use the code here, add an implementation for something simple, like a multiplication and open a new PR. That'd be something we can discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants