[AArch64][GlobalISel] Overall GISel operation status #115133

davemgreen · 2024-11-06T08:18:50Z

This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date.

A few high level comments

This does not include SVE, we should probably do the same elsewhere.
BF16 still needs to be added, but requires a new way to specify the types / operations.
BigEndian isn't handled yet.
Currently some operations widen, some promote. We should stick to one (probably widen).
Blank spaces usually mean not checked / not supported. We will get to the point where random-testing will start to be more useful.

Legend:

Scalar normal = i8/i16/i32/i64
Vector legal = v8i8/v4i16/v2i32 + v16i8/v8i16/v4i32/v2i64
Vector larger/smaller = i8/i16/i32/i64 types with non-legal sizes
i128 = scalar/vector
i1 = scalar/vector
Scalar ext = non-power2 sizes, including larger sizes
Vector odd widths = i8/i16/i32/i64 with non-power-2 widths.
Vector odd eltsize = non-power2 elt sizes (or i128, etc).

Operation	Scalar normal	Vector legal	i128	i1	Vector larger / smaller	Scalar ext	Vector odd widths	Vector odd eltsizes	Additional Notes
load	y	y
store	y	y
bitcast? ptrtoint? inttoptr?	y	y
memcpy? memmove? memset? bzero?
Int Operation	Scalar normal	Vector normal	i128 s/v	i1 s/v	Vector larger / smaller	Scalar non-power-2	Vector odd widths	Vector odd eltsizes	Additional Notes
add	y	y	y/y		y	y	x	x	https://godbolt.org/z/6c1rfWTK8
sub	y	y	y/y		y	y	x	x
mul	y	y	y/y inefficient		y				Scalar i128 could be better. https://godbolt.org/z/8Wd8zhezc
sdiv, udiv	y	y	y/y		y				Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh.
srem, urem	y	y	y/y		y				Scalar i1:
zext, sext, anyext	y	y							ZEXT: Global ISel could be improved to match SDAG by using BIC for
trunc	y	y	y				x Non-pow2 larger than 8
and	y	y	y/y		y				https://godbolt.org/z/6Y98TnYv8
or	y	y	y/y		y
xor	y	y	y/y		y
not?	y	y	y		y				https://godbolt.org/z/rh4ob1be7
shl	y	y	y		y (v2i8)			x	Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
ashr	y	y	y		y(v2i8)			x	Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
lshr	y	y	y		y(v2i8)			x	Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
icmp	y	y	y (i128 could be better)	x	y(v2i8)				i128 could do a lot better.
select	y	y	y		y (v2i8)				Scalarl: Unnecessary AND to clear upper lanes of the condition register
abs	y	y	y		x	y			https://godbolt.org/z/Tobs7YeoT
smin/smax/umin/umax	y	y	y		y	x > i128			i1/i128 could do better. https://godbolt.org/z/j7nx789oz.
uaddsat/usubsat/saddsat/ssubsat	y	y	y			y			https://godbolt.org/z/4MT14bfsv
bitreverse	y	x	y			y			https://godbolt.org/z/3sd988Mhd
bswap	y	x	y		x	y
ctlz	y	y	y		y	x > i128
cttz	y	y	y		x	x > i128
ctpop	y	y	y		x	x
fshr/fshl	y	y	y		x	x NonPow2 > 128			Scalar Normal:
rotr/rotl?	y	y	y		y	y
uaddo, usubo, uadde, usube?
umulo, smulo?
umulh, smulh
ushlsat, sshlsat
smulfix, umulfix
smulfixsat, umulfixsat
sdivfix, udivfix
sdivfixsat, udivfixsat
FP Operation	Scalar normal	Vector legal	f128 s/v		Vector smaller / larger	bf16 s/v	Vector widths		Additional Notes
fadd	y	y	y/y		y				https://godbolt.org/z/bYWfo9v16
fsub	y	y	y/y		y
fmul	y	y	y/y		y
fma	y	y	y/y		y				https://godbolt.org/z/1osE3Whaq
fmuladd	y	y	y/y		y
fdiv	y	y	y/y		y
frem	y	y	y/y		y
fneg	y	y	y/y		y				https://godbolt.org/z/rz96eh3PW
fpext	y	y	y/y		y				https://godbolt.org/z/358EG4j7r
fptrunc	y	y	y/y		y				https://godbolt.org/z/7a7hq6j68
fptosi, fptoui	y	y	y/y		y
fptosisat, fptouisat
sitofp, uitofp	y	y	y/y		y				https://godbolt.org/z/j7Prz7qj6
fabs	y	y	y/y		y				https://godbolt.org/z/o95h4a9es
fsqrt	y	y	y/y		y
ceil, floor, trunc, rint, nearbyint	y	y	y/y		y				https://godbolt.org/z/zjMqq5oeo
lrint, llrint, lround, llround
fminnum, fmaxnum	y	y	y/y		y
fminimum, fmaximum	y	y			y
fminimumnum, fmaximumnum
fcopysign	y	y	y/y		y				https://godbolt.org/z/aq5bbc4jG
fpow	y	y	y/y		y				https://godbolt.org/z/WEeWYj1e4
fpowi	y	y	y/y		y
sin, cos, etc	y	y	y/y		y
fexp, fexp2, flog, flog2, flog10	y	y	y/y		y
fldexp, frexmp
fcanonicalize
is_fpclass
Vector Operation	Scalar normal		Vector legal	Vector smaller / larger		Scalar ext	Vector odd widths	Vector odd eltsizes	Additional Notes
insert	-	-	y	y		-
extract	-	-	y	y		-
shuffle*	-	-				-
dup	-	-	y			-
ext	-	-	y	y		-
zip1/zip2/uzp2/uzp2/trn1/trn2	-	-	y			-
tbl	-	-	y	y		-			Could do with tbl2/tbl4 combines
reverse	-	-				-			Needs full reverses from https://godbolt.org/z/1chrbKjhs
perfect shuffles	-	-				-
reduce.add	-	-				-			Integer reductions in ISel use i32 return types. They can be i8/i16 in GISel.
reduce.mul	-	-				-
reduce.smin/smax/umin/umax	-	-				-
reduce.and/or/xor	-	-				-
reduce.fadd	-	-				-			Needs sequential
reduce.fmul	-	-				-			Needs sequential, plus #73309
reduce.fmin/fmax/fminimum/fmaxmum	-	-	y			-	x

llvmbot · 2024-11-06T08:19:09Z

@llvm/issue-subscribers-backend-aarch64

Author: David Green (davemgreen)

This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date.

A few high level comments

This does not include SVE, we should probably do the same elsewhere.
BF16 still needs to be added, but requires a new way to specify the types / operations.
BigEndian isn't handled yet.
Currently some operations widen, some promote. We should stick to one (probably widen).
Blank spaces usually mean not checked / not supported. We will get to the point where random-testing will start to be more useful.

Legend:

Scalar normal = i8/i16/i32/i64
Vector legal = v8i8/v4i16/v2i32 + v16i8/v8i16/v4i32/v2i64
Vector larger/smaller = i8/i16/i32/i64 types with non-legal sizes
i128 = scalar/vector
i1 = scalar/vector
Scalar ext = non-power2 sizes, including larger sizes
Vector odd widths = i8/i16/i32/i64 with non-power-2 widths.
Vector odd eltsize = non-power2 elt sizes (or i128, etc).

Operation	Scalar normal	Vector legal	i128	i1	Vector larger / smaller	Scalar ext	Vector odd widths	Vector odd eltsizes	Additional Notes
load	y	y
store	y	y
bitcast? ptrtoint? inttoptr?	y	y
memcpy? memmove? memset? bzero?
Int Operation	Scalar normal	Vector normal	i128 s/v	i1 s/v	Vector larger / smaller	Scalar non-power-2	Vector odd widths	Vector odd eltsizes	Additional Notes
add	y	y	y/y		y	y	x	x	https://godbolt.org/z/6c1rfWTK8
sub	y	y	y/y		y	y	x	x
mul	y	y	y/y inefficient		y				Scalar i128 could be better. https://godbolt.org/z/8Wd8zhezc
sdiv, udiv	y	y	y/y		y				Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh.
srem, urem	y	y	y/y		y				Scalar i1:
zext, sext, anyext	y	y							ZEXT: Global ISel could be improved to match SDAG by using BIC for
trunc	y	y	y				x Non-pow2 larger than 8
and	y	y	y/y		y				https://godbolt.org/z/6Y98TnYv8
or	y	y	y/y		y
xor	y	y	y/y		y
not?	y	y	y		y				https://godbolt.org/z/rh4ob1be7
shl	y	y	y		y (v2i8)			x	Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
ashr	y	y	y		y(v2i8)			x	Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
lshr	y	y	y		y(v2i8)			x	Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
icmp	y	y	y (i128 could be better)	x	y(v2i8)				i128 could do a lot better.
select	y	y	y		y (v2i8)				Scalarl: Unnecessary AND to clear upper lanes of the condition register
abs	y	y	y		x	y			https://godbolt.org/z/Tobs7YeoT
smin/smax/umin/umax	y	y	y		y	x > i128			i1/i128 could do better. https://godbolt.org/z/j7nx789oz.
uaddsat/usubsat/saddsat/ssubsat	y	y	y			y			https://godbolt.org/z/4MT14bfsv
bitreverse	y	x	y			y			https://godbolt.org/z/3sd988Mhd
bswap	y	x	y		x	y
ctlz	y	y	y		y	x > i128
cttz	y	y	y		x	x > i128
ctpop	y	y	y		x	x
fshr/fshl	y	y	y		x	x NonPow2 > 128			Scalar Normal:
rotr/rotl?	y	y	y		y	y
uaddo, usubo, uadde, usube?
umulo, smulo?
umulh, smulh
ushlsat, sshlsat
smulfix, umulfix
smulfixsat, umulfixsat
sdivfix, udivfix
sdivfixsat, udivfixsat
FP Operation	Scalar normal	Vector legal	f128 s/v		Vector smaller / larger	bf16 s/v	Vector widths		Additional Notes
fadd	y	y	y/y		y				https://godbolt.org/z/bYWfo9v16
fsub	y	y	y/y		y
fmul	y	y	y/y		y
fma	y	y	y/y		y				https://godbolt.org/z/1osE3Whaq
fmuladd	y	y	y/y		y
fdiv	y	y	y/y		y
frem	y	y	y/y		y
fneg	y	y	y/y		y				https://godbolt.org/z/rz96eh3PW
fpext	y	y	y/y		y				https://godbolt.org/z/358EG4j7r
fptrunc	y	y	y/y		y				https://godbolt.org/z/7a7hq6j68
fptosi, fptoui	y	y	y/y		y
fptosisat, fptouisat
sitofp, uitofp	y	y	y/y		y				https://godbolt.org/z/j7Prz7qj6
fabs	y	y	y/y		y				https://godbolt.org/z/o95h4a9es
fsqrt	y	y	y/y		y
ceil, floor, trunc, rint, nearbyint	y	y	y/y		y				https://godbolt.org/z/zjMqq5oeo
lrint, llrint, lround, llround
fminnum, fmaxnum	y	y	y/y		y
fminimum, fmaximum	y	y			y
fminimumnum, fmaximumnum
fcopysign	y	y	y/y		y				https://godbolt.org/z/aq5bbc4jG
fpow	y	y	y/y		y				https://godbolt.org/z/WEeWYj1e4
fpowi	y	y	y/y		y
sin, cos, etc	y	y	y/y		y
fexp, fexp2, flog, flog2, flog10	y	y	y/y		y
fldexp, frexmp
fcanonicalize
is_fpclass
Vector Operation	Scalar normal		Vector legal	Vector smaller / larger		Scalar ext	Vector odd widths	Vector odd eltsizes	Additional Notes
insert	-	-	y	y		-
extract	-	-	y	y		-
shuffle*	-	-				-
dup	-	-	y			-
ext	-	-	y	y		-
zip1/zip2/uzp2/uzp2/trn1/trn2	-	-	y			-
tbl	-	-	y	y		-			Could do with tbl2/tbl4 combines
reverse	-	-				-			Needs full reverses from https://godbolt.org/z/1chrbKjhs
perfect shuffles	-	-				-
reduce.add	-	-				-			Integer reductions in ISel use i32 return types. They can be i8/i16 in GISel.
reduce.mul	-	-				-
reduce.smin/smax/umin/umax	-	-				-
reduce.and/or/xor	-	-				-
reduce.fadd	-	-				-			Needs sequential
reduce.fmul	-	-				-			Needs sequential, plus #73309
reduce.fmin/fmax/fminimum/fmaxmum	-	-	y			-	x

madhur13490 · 2024-11-06T10:51:22Z

+1. This complements some of our understanding so far.

In addition to this we are also tracking SPEC 2017, RAJAPerf and TSVC internally in SVE and nosve mode to track the number of fallbacks. Our CI emits the number of fallbacks each day on these benchmarks. This helps us to make sure we don't introduce new fallbacks.

We also found that inlineasm is not supported in GISel. (Varrgs wasn't supported until last month but @Him188 landed patch to support in instruction selector last month)

I plan to bring this to the agenda in the next AArch64 sync which @sjoerdmeijer hosts. We should coordinate on this and may be file fine-level issues so that we don't repeat the work(?)

What do you think @davemgreen?

davemgreen added backend:AArch64 llvm:globalisel labels Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][GlobalISel] Overall GISel operation status #115133

[AArch64][GlobalISel] Overall GISel operation status #115133

davemgreen commented Nov 6, 2024

llvmbot commented Nov 6, 2024

madhur13490 commented Nov 6, 2024

[AArch64][GlobalISel] Overall GISel operation status #115133

[AArch64][GlobalISel] Overall GISel operation status #115133

Comments

davemgreen commented Nov 6, 2024

llvmbot commented Nov 6, 2024

madhur13490 commented Nov 6, 2024