Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions riscv-cc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,208 @@ NOTE: `setjmp`/`longjmp` follow the standard calling convention, which clobbers
all vector registers. Hence, the standard vector calling convention variant
won't disrupt the `jmp_buf` ABI.

NOTE: Functions that use the standard vector calling convention
variant follow an additional name mangling rule for {Cpp}.
For more details, see <<Name Mangling for Standard Calling Convention Variant>>.

=== Standard Fixed-length Vector Calling Convention Variant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variant itself seems fine, modulo nits, but how are we planning to enable it?

If it's automatically used by -march=rva23 -mabi=ilp32d that will create major compatibility issues for binary distributions that use a fixed ABI and allow mixing packages at different architecture levels (either as an explicit user action, or as an implementation detail when rebuilding the distribution to change the architecture requirement).

If a new -mabi= value is required to enable use of the variant, it will be usable on closed systems where all packages are built at once, but not on binary distributions, since there is no expectation that binary code built with different -mabi= options is interoperable at all. This will include Debian and Alpine and might include Android and Fedora if their ABIs are finalized prior to the acceptance of this PR.

If it's enabled on a per-function basis using an attribute, or automatically for functions not visible across DSO boundaries, then it's effectively part of the definition of the attribute or a compiler implementation detail and may belong in riscv-c-api-doc or gccint, not here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation is that should be enabled by per-function basis by attribute, and I think that should have a riscv-c-api-doc PR for that, will send that in the next few days.


This section defines the calling convention variant for fixed-length vectors.
The intention of this variant is to pass fixed-length vectors via the vector
registers. For the definition of a fixed-length vector, see
<<Fixed-Length Vector>>.

This variant is based on the standard vector calling convention variant:
the register convention and the rules for passing arguments and return values
are the same.

NOTE: The reason we define a separate calling convention variant is that we
would like to define a flexible convention to utilize the variable length
feature in the vector extension, also considering embedded vector extensions,
such as `Zve32x`.

ABI_VLEN refers to the width of a vector register in the calling convention
variant.

The ABI_VLEN must be no wider than the ISA's VLEN, meaning that the ISA may
support wider vector registers than the ABI, but the ABI's VLEN cannot exceed
the ISA's VLEN.

ABI_VLEN represents the width (in bits) of the vector registers available in the
calling convention for fixed-length vectors. ABI_VLEN can vary from 32 bits
(as in `Zve32x`) up to the maximum supported by the ISA. The flexibility of
ABI_VLEN enables the convention to adapt to both low-end embedded systems and
high-performance processors that utilize wider vector registers.

The ABI_VLEN is a parameter of this calling convention variant. It could be set
by a command line option for the compiler or specified by a function
attribute in the source code.

NOTE: We suggest the toolchain implementation set the default value of ABI_VLEN
to 128, as it's the most common minimal requirement. However, it is not fixed
to 128, since the ISA allows the VLEN to be only 32 bits or 64 bits. This
also enables the utilization of the capacity of longer VLEN. Users can build
with an optimized library with larger ABI_VLEN for better utilization of those
cores with longer VLEN.

A fixed-length vector argument is passed in one vector argument register if the
size of the vector is less than or equal to ABI_VLEN bit.

[NOTE]
====
Even in the absence of specific vector extension support for certain element
types, such as `__bf16`, `_Float16`, `float`, or `double`, the standard
fixed-length vector calling convention rules still apply. For example,
even without the support of extensions like `Zvfbfmin`, `Zve32f`, or `Zve64d`,
these element types will be passed according to the calling convention rules
outlined here.

Additionally, data types such as `__int128_t`, which currently do not
have direct support in any vector extension, will also follow these rules.
This design ensures that the calling convention remains forward-compatible,
minimizing the need for continuous adjustments as new extensions and data types
are introduced in the future.

The consistency in applying these rules to unsupported element types guarantees
a smooth transition when future vector extensions become available, allowing for
seamless integration of new features without requiring significant changes to
the calling convention.
====

A fixed-length vector argument is passed in two vector argument registers,
similar to vector data arguments with LMUL=2 and following the same register
constraints, if the size of the vector is greater than ABI_VLEN bits and less
than or equal to 2×ABI_VLEN bits.

A fixed-length vector argument is passed in four vector argument registers,
similar to vector data arguments with LMUL=4 and following the same register
constraints, if the size of the vector is greater than ABI_VLEN bits and less
than or equal to 4×ABI_VLEN bits.

A fixed-length vector argument is passed in eight vector argument registers,
similar to vector data arguments with LMUL=4 and following the same register
constraints, if the size of the vector is greater than ABI_VLEN bits and less
than or equal to 8×ABI_VLEN bits.

[NOTE]
====
Fixed-length vectors that are not a power-of-2 in size will be rounded up to
the next power-of-2 length for the purpose of register allocation and handling.
For instance, a vector type like `int32x3_t` (which contains three 32-bit
integers) will be treated as an `int32x4_t` (a 128-bit vector, as LMUL=1 for
ABI_VLEN=128) in the ABI, and passed accordingly. This ensures consistency in
how vectors are handled and simplifies the process of argument passing.

Example: Consider an `int32x3_t` vector (three 32-bit integers):
- The vector's total size is 96 bits, which is not a power of 2.
- The ABI will round up the size to 128 bits (corresponding to `int32x4_t`),
meaning the vector will be passed using one vector argument register when
ABI_VLEN=128.

This rule applies to all non-power-of-2 fixed-length vectors, ensuring they
are treated consistently across different ABI_VLEN settings.
====

A fixed-length vector argument is passed by reference and is replaced in the
argument list with the address if it is larger than 8×ABI_VLEN bit or if
there is a shortage of vector argument registers.

A struct containing members with all fixed-length vectors will be passed in
vector argument registers like a vector tuple type if all members have the
same length, the length is less than or equal to 4×ABI_VLEN bit, and the size of
the whole struct is less than or equal to 8×ABI_VLEN bit.
If there are not enough vector argument registers to pass the entire struct,
it will pass by reference and is replaced in the argument list with the address.
Otherwise, it will use the rule defined in the hardware floating-point calling
convention.

A struct containing just one fixed-length vector or a fixed-length vector
array of length one, will be flattened as a single fixed-length vector argument
if the size of the vector is less than or equal to 8×ABI_VLEN bit.

Structs with zero-length fixed-length arrays use the rule defined in the hardware
floating-point calling convention, which means it won't consume vector argument
register either in C or {Cpp}.

A struct containing just one fixed-length vector array is passed as though it
were a vector tuple type if the size of the base element for the array is less than
or equal to 8×ABI_VLEN bit, and the size of the array is less than 8×ABI_VLEN
bits.
If there are not enough vector argument registers to pass the entire struct,
it will pass by reference and is replaced in the argument list with the address.
Otherwise, it will use the rule defined in the hardware floating-point
calling convention.

Unions with fixed-length vectors are always passed according to the integer
calling convention.

The details of vector argument register rules are the same as the standard
vector calling convention variant.

NOTE: Functions that use the standard fixed-length vector calling convention
variant must be marked with STO_RISCV_VARIANT_CC. See <<Dynamic Linking>>
for the meaning of STO_RISCV_VARIANT_CC.

NOTE: Functions that use the standard fixed-length vector calling convention
variant follow an additional name mangling rule for {Cpp}.
For more details, see <<Name Mangling for Standard Calling Convention Variant>>.

[NOTE]
====
When ABI_VLEN is smaller than the VLEN, the number of vector argument
registers utilized remains unchanged. However, in such cases, values are only
placed in a portion of these vector argument registers, corresponding to the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you would only use a portion of the vector registers.
This will require rearranging data by the caller and the callee.
Instead you could leave vector registers unused which would be much more efficient.

I don't think that we should keep the current design.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ABI_VLEN should simply provide the maximum of bits that can be exchanged between the caller and the callee via registers.

Arguments smaller than or equal to ABI_VLEN should be passed by up to e.g. 4 registers.
Arguments larger than ABI_VLEN should be passed by stack.

But data in vector registers should always be placed as compact as possible.

Copy link
Collaborator Author

@kito-cheng kito-cheng Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why that will trigger data rearranging? Do you mind give an example for that?

Give some practical example, so that we can discussed with some concrete case :

typedef signed long long __attribute__( ( vector_size( 64 ) ) ) int64x8_t;

__attribute__((riscv_vls_cc(128)))
int64x8_t foo (int64x8_t a, int64x8_t b);
// Return in v8-v11 since 512 bits use LMUL=4 and will occupy 4 registers
// Pass a in v8-v11 since 512 bits use LMUL=4 and will occupy 4 registers
// Pass b in v12-v15 since 512 bits use LMUL=4 and will occupy 4 registers
// Compile with -march=rv64gcv_zvl512b
void bar()
{
  // Assume a assigned to v8
  int64x8_t a = {1, 2, 3, 4, 5, 6, 7, 8};
  // Assume b assigned to v9
  int64x8_t b = {1, 2, 3, 4, 5, 6, 7, 8};
  // Pass a to foo, although it occupy 4 register according the ABI_VLEN
  // But we can still pass that without a without rearranging
  // So v9-v11 is leaving unset
  // Move b to v12 due to ABI requirement, this can be optimized
  // by register allocator in general
  // v13-v15 is leaving unset
  a = foo (a, b);
}
// Compile with -march=rv64gcv (VLEN=128)
void bar()
{
  // Assume a assigned to v8-v11
  int64x8_t a = {1, 2, 3, 4, 5, 6, 7, 8};
  // Assume b assigned to v12-v15
  int64x8_t b = {1, 2, 3, 4, 5, 6, 7, 8};
  // Pass a to foo in v8-v11
  // Pass b to foo in v12-v15
  a = foo (a, b);
}

In foo, that will use operate vector operation with VL=4 and LMUL=4, so that could ensure that got same result on different VLEN machine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the assembler code everything looks fine.

"the number of vector argument registers utilized remains unchanged" is a bit misleading as in the table below you essentially indicate that some registers may remain unused "-,-,-,-," depending on the machine size.

size of ABI_VLEN. The remaining portion of the vector argument registers, which
extends beyond the ABI_VLEN, will remain idle. This means that while the full
capacity of the vector argument registers may not be used, the allocation of
these registers do not change, ensuring consistency in register usage regardless
of the ABI_VLEN to VLEN ratio.

Example: With ABI_VLEN at 32 bits and VLEN at 128 bits, consider passing an
`int32x4_t` parameter (four 32-bit integers).

Allocation: Four vector argument registers are allocated for
`int32x4_t`, based on LMUL=4.

Utilization: All four integers are placed in the first vector register,
utilizing its full 128-bit capacity (VLEN), despite ABI_VLEN being 32 bits.

Remaining Registers: The other three allocated registers remain unused and idle.

.int32x4_t layout on different VLEN with ABI_VLEN at 32 bits:
[cols="2,3,3,3,3"]
[width=100%]
|===
| VLEN | v8 | v9 | v10 | v11

| 32 | a | b | c | d
| 64 | a, b | c, d | -, - | -, -
| 128 | a, b, c, d | -, -, -, - | -, -, -, - | -, -, -, -
| 256 | a, b, c, d, -, -, -, - | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, -
|===
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the example look like for 512 bits of values and vlen=256?
Adding an example for this would make it clear if you are only using 128 bits per register or the full vlen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 834023c

NOTE: I choose int64x8_t instead of int32x16_t because int32x16_t will need to wrap in the PDF layout, which may cause unnecessary misreading.

riscv-abi.pdf


.int64x8_t layout on different VLEN with ABI_VLEN at 128 bits:
[cols="2,3,3,3,3"]
[width=100%]
|===
| VLEN | v8 | v9 | v10 | v11

| 128 | a, b | c, d | e, f | g, h
| 256 | a, b, c, d | e, f, g, h | -, -, -, - | -, -, -, -
| 512 | a, b, c, d, e, f, g, h | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, -
|===

`-` means that part are not used, and the value can be anything.

====

NOTE: In a single compilation unit, different functions may use different
ABI_VLEN values. This means that ABI_VLEN is not uniform across the entire unit,
allowing for function-specific optimization. However, this necessitates that
users ensure consistency in ABI_VLEN between calling and called functions. It
is the user's responsibility to verify that the ABI_VLEN matches on both sides
of a function call to ensure correct operation and data handling.

=== ILP32E Calling Convention

IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the
Expand Down
28 changes: 28 additions & 0 deletions riscv-elf.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,34 @@ See the "Type encodings" section in _Itanium {Cpp} ABI_
for more detail on how to mangle types. Note that `__bf16` is mangled in the
same way as `std::bfloat16_t`.

=== Name Mangling for Standard Calling Convention Variant

Functions using the standard calling convention variant have to append extra ABI tag to
the function name mangling, the rule is the same as the "ABI tags" section in
_Itanium {Cpp} ABI_.

.ABI Tag name for calling convention variants
[cols="5,2"]
[width=80%]
|===
| Name | ABI tag name

| Standard fixed-length vector calling convention variant | riscv_vls_cc_<ABI_VLEN>
|===


For example:
[,c]
----
__attribute__((riscv_vls_cc(128))) void foo();
----

is mangled as
[,c]
----
_Z3fooB12riscv_vls_cc_128v
----

=== Name Mangling for Vector Data Types, Vector Mask Types and Vector Tuple Types.

The vector data types and vector mask types, as defined in the section
Expand Down