Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][GlobalISel] Overall GISel operation status #115133

Open
davemgreen opened this issue Nov 6, 2024 · 2 comments
Open

[AArch64][GlobalISel] Overall GISel operation status #115133

davemgreen opened this issue Nov 6, 2024 · 2 comments

Comments

@davemgreen
Copy link
Collaborator

This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date.

A few high level comments

  • This does not include SVE, we should probably do the same elsewhere.
  • BF16 still needs to be added, but requires a new way to specify the types / operations.
  • BigEndian isn't handled yet.
  • Currently some operations widen, some promote. We should stick to one (probably widen).
  • Blank spaces usually mean not checked / not supported. We will get to the point where random-testing will start to be more useful.

Legend:

  • Scalar normal = i8/i16/i32/i64
  • Vector legal = v8i8/v4i16/v2i32 + v16i8/v8i16/v4i32/v2i64
  • Vector larger/smaller = i8/i16/i32/i64 types with non-legal sizes
  • i128 = scalar/vector
  • i1 = scalar/vector
  • Scalar ext = non-power2 sizes, including larger sizes
  • Vector odd widths = i8/i16/i32/i64 with non-power-2 widths.
  • Vector odd eltsize = non-power2 elt sizes (or i128, etc).
Operation Scalar normal Vector legal i128 i1 Vector larger / smaller Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
load y y
store y y
bitcast? ptrtoint? inttoptr? y y
memcpy? memmove? memset? bzero?
Int Operation Scalar normal Vector normal i128 s/v i1 s/v Vector larger / smaller Scalar non-power-2 Vector odd widths Vector odd eltsizes Additional Notes
add y y y/y y y x x https://godbolt.org/z/6c1rfWTK8
sub y y y/y y y x x
mul y y y/y inefficient y Scalar i128 could be better. https://godbolt.org/z/8Wd8zhezc
sdiv, udiv y y y/y y Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh.
srem, urem y y y/y y Scalar i1:
zext, sext, anyext y y ZEXT: Global ISel could be improved to match SDAG by using BIC for 
trunc y y y x Non-pow2 larger than 8
and y y y/y y https://godbolt.org/z/6Y98TnYv8
or y y y/y y
xor y y y/y y
not? y y y y https://godbolt.org/z/rh4ob1be7
shl y y y y (v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
ashr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
lshr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
icmp y y y (i128 could be better) x y(v2i8) i128 could do a lot better.
select y y y y (v2i8) Scalarl: Unnecessary AND to clear upper lanes of the condition register
abs y y y x y https://godbolt.org/z/Tobs7YeoT
smin/smax/umin/umax y y y y x > i128 i1/i128 could do better. https://godbolt.org/z/j7nx789oz.
uaddsat/usubsat/saddsat/ssubsat y y y y https://godbolt.org/z/4MT14bfsv
bitreverse y x y y https://godbolt.org/z/3sd988Mhd
bswap y x y x y
ctlz y y y y x > i128
cttz y y y x x > i128
ctpop y y y x x
fshr/fshl y y y x x NonPow2 > 128 Scalar Normal:
rotr/rotl? y y y y y
uaddo, usubo, uadde, usube?
umulo, smulo?
umulh, smulh
ushlsat, sshlsat
smulfix, umulfix
smulfixsat, umulfixsat
sdivfix, udivfix
sdivfixsat, udivfixsat
FP Operation Scalar normal Vector legal f128 s/v Vector smaller / larger bf16 s/v Vector widths Additional Notes
fadd y y y/y y https://godbolt.org/z/bYWfo9v16
fsub y y y/y y
fmul y y y/y y
fma y y y/y y https://godbolt.org/z/1osE3Whaq
fmuladd y y y/y y
fdiv y y y/y y
frem y y y/y y
fneg y y y/y y https://godbolt.org/z/rz96eh3PW
fpext y y y/y y https://godbolt.org/z/358EG4j7r
fptrunc y y y/y y https://godbolt.org/z/7a7hq6j68
fptosi, fptoui y y y/y y
fptosisat, fptouisat
sitofp, uitofp y y y/y y https://godbolt.org/z/j7Prz7qj6
fabs y y y/y y https://godbolt.org/z/o95h4a9es
fsqrt y y y/y y
ceil, floor, trunc, rint, nearbyint y y y/y y https://godbolt.org/z/zjMqq5oeo
lrint, llrint, lround, llround
fminnum, fmaxnum y y y/y y
fminimum, fmaximum y y y
fminimumnum, fmaximumnum
fcopysign y y y/y y https://godbolt.org/z/aq5bbc4jG
fpow y y y/y y https://godbolt.org/z/WEeWYj1e4
fpowi y y y/y y
sin, cos, etc y y y/y y
fexp, fexp2, flog, flog2, flog10 y y y/y y
fldexp, frexmp
fcanonicalize
is_fpclass
Vector Operation Scalar normal Vector legal Vector smaller / larger Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
insert - - y y -
extract - - y y -
shuffle* - - -
dup - - y -
ext - - y y -
zip1/zip2/uzp2/uzp2/trn1/trn2 - - y -
tbl - - y y - Could do with tbl2/tbl4 combines
reverse - - - Needs full reverses from https://godbolt.org/z/1chrbKjhs
perfect shuffles - - -
reduce.add - - - Integer reductions in ISel use i32 return types. They can be i8/i16 in GISel.
reduce.mul - - -
reduce.smin/smax/umin/umax - - -
reduce.and/or/xor - - -
reduce.fadd - - - Needs sequential
reduce.fmul - - - Needs sequential, plus #73309
reduce.fmin/fmax/fminimum/fmaxmum - - y - x  
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 6, 2024

@llvm/issue-subscribers-backend-aarch64

Author: David Green (davemgreen)

This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date.

A few high level comments

  • This does not include SVE, we should probably do the same elsewhere.
  • BF16 still needs to be added, but requires a new way to specify the types / operations.
  • BigEndian isn't handled yet.
  • Currently some operations widen, some promote. We should stick to one (probably widen).
  • Blank spaces usually mean not checked / not supported. We will get to the point where random-testing will start to be more useful.

Legend:

  • Scalar normal = i8/i16/i32/i64
  • Vector legal = v8i8/v4i16/v2i32 + v16i8/v8i16/v4i32/v2i64
  • Vector larger/smaller = i8/i16/i32/i64 types with non-legal sizes
  • i128 = scalar/vector
  • i1 = scalar/vector
  • Scalar ext = non-power2 sizes, including larger sizes
  • Vector odd widths = i8/i16/i32/i64 with non-power-2 widths.
  • Vector odd eltsize = non-power2 elt sizes (or i128, etc).
Operation Scalar normal Vector legal i128 i1 Vector larger / smaller Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
load y y
store y y
bitcast? ptrtoint? inttoptr? y y
memcpy? memmove? memset? bzero?
Int Operation Scalar normal Vector normal i128 s/v i1 s/v Vector larger / smaller Scalar non-power-2 Vector odd widths Vector odd eltsizes Additional Notes
add y y y/y y y x x https://godbolt.org/z/6c1rfWTK8
sub y y y/y y y x x
mul y y y/y inefficient y Scalar i128 could be better. https://godbolt.org/z/8Wd8zhezc
sdiv, udiv y y y/y y Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh.
srem, urem y y y/y y Scalar i1:
zext, sext, anyext y y ZEXT: Global ISel could be improved to match SDAG by using BIC for 
trunc y y y x Non-pow2 larger than 8
and y y y/y y https://godbolt.org/z/6Y98TnYv8
or y y y/y y
xor y y y/y y
not? y y y y https://godbolt.org/z/rh4ob1be7
shl y y y y (v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
ashr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
lshr y y y y(v2i8) x Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify.
icmp y y y (i128 could be better) x y(v2i8) i128 could do a lot better.
select y y y y (v2i8) Scalarl: Unnecessary AND to clear upper lanes of the condition register
abs y y y x y https://godbolt.org/z/Tobs7YeoT
smin/smax/umin/umax y y y y x > i128 i1/i128 could do better. https://godbolt.org/z/j7nx789oz.
uaddsat/usubsat/saddsat/ssubsat y y y y https://godbolt.org/z/4MT14bfsv
bitreverse y x y y https://godbolt.org/z/3sd988Mhd
bswap y x y x y
ctlz y y y y x > i128
cttz y y y x x > i128
ctpop y y y x x
fshr/fshl y y y x x NonPow2 > 128 Scalar Normal:
rotr/rotl? y y y y y
uaddo, usubo, uadde, usube?
umulo, smulo?
umulh, smulh
ushlsat, sshlsat
smulfix, umulfix
smulfixsat, umulfixsat
sdivfix, udivfix
sdivfixsat, udivfixsat
FP Operation Scalar normal Vector legal f128 s/v Vector smaller / larger bf16 s/v Vector widths Additional Notes
fadd y y y/y y https://godbolt.org/z/bYWfo9v16
fsub y y y/y y
fmul y y y/y y
fma y y y/y y https://godbolt.org/z/1osE3Whaq
fmuladd y y y/y y
fdiv y y y/y y
frem y y y/y y
fneg y y y/y y https://godbolt.org/z/rz96eh3PW
fpext y y y/y y https://godbolt.org/z/358EG4j7r
fptrunc y y y/y y https://godbolt.org/z/7a7hq6j68
fptosi, fptoui y y y/y y
fptosisat, fptouisat
sitofp, uitofp y y y/y y https://godbolt.org/z/j7Prz7qj6
fabs y y y/y y https://godbolt.org/z/o95h4a9es
fsqrt y y y/y y
ceil, floor, trunc, rint, nearbyint y y y/y y https://godbolt.org/z/zjMqq5oeo
lrint, llrint, lround, llround
fminnum, fmaxnum y y y/y y
fminimum, fmaximum y y y
fminimumnum, fmaximumnum
fcopysign y y y/y y https://godbolt.org/z/aq5bbc4jG
fpow y y y/y y https://godbolt.org/z/WEeWYj1e4
fpowi y y y/y y
sin, cos, etc y y y/y y
fexp, fexp2, flog, flog2, flog10 y y y/y y
fldexp, frexmp
fcanonicalize
is_fpclass
Vector Operation Scalar normal Vector legal Vector smaller / larger Scalar ext Vector odd widths Vector odd eltsizes Additional Notes
insert - - y y -
extract - - y y -
shuffle* - - -
dup - - y -
ext - - y y -
zip1/zip2/uzp2/uzp2/trn1/trn2 - - y -
tbl - - y y - Could do with tbl2/tbl4 combines
reverse - - - Needs full reverses from https://godbolt.org/z/1chrbKjhs
perfect shuffles - - -
reduce.add - - - Integer reductions in ISel use i32 return types. They can be i8/i16 in GISel.
reduce.mul - - -
reduce.smin/smax/umin/umax - - -
reduce.and/or/xor - - -
reduce.fadd - - - Needs sequential
reduce.fmul - - - Needs sequential, plus #73309
reduce.fmin/fmax/fminimum/fmaxmum - - y - x  

@madhur13490
Copy link
Contributor

+1. This complements some of our understanding so far.

In addition to this we are also tracking SPEC 2017, RAJAPerf and TSVC internally in SVE and nosve mode to track the number of fallbacks. Our CI emits the number of fallbacks each day on these benchmarks. This helps us to make sure we don't introduce new fallbacks.

We also found that inlineasm is not supported in GISel. (Varrgs wasn't supported until last month but @Him188 landed patch to support in instruction selector last month)

I plan to bring this to the agenda in the next AArch64 sync which @sjoerdmeijer hosts. We should coordinate on this and may be file fine-level issues so that we don't repeat the work(?)

What do you think @davemgreen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants