Add an arch for RISCV with vector extension #625

michael-roe · 2023-05-17T17:45:29Z

No description provided.

Signed-off-by: Michael Roe <[email protected]>

drom · 2023-07-13T01:58:11Z

It would be cool porting Volk on RVV
I want to be part of it.

jdemel · 2023-07-13T07:44:58Z

I'd be in favor of adding RiscV support for VOLK. Also, we already run QA tests for RiscV and we already have some hand optimized assembly for this ISA. Though, no vector support for it.
As far as I know it is very difficult to get RiscV systems with vector extensions.

This PR in particular is difficult though because riscv_vector ends up in the list of x86 extensions. Also, I thought this PR was closed in favor of a later one.

drom · 2023-07-13T20:58:29Z

RISC-V Vector v1.0 (RVV1) looks like perfect fit for Volk.
We will see ICs with RVV1 in the near future. For now we can use simulator.
I can help with cycle accurate profiling on RVV cores that SiFive provides.

jdemel · 2023-11-04T10:00:26Z

Is riscv_vector a good name for this architecture? The RiscV naming system is getting a bit confusing.

drom · 2023-11-04T17:16:43Z

RISC-V has naming convention (ISA-string) described in the spec:
https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
Chapter 22. Page 121

I think the baseline Vector ISA for this project should be RV64I2p0M2p0A2p0F2p0D2p0V1p0
can be checked here: https://rv.drom.io/?RV64I2p0M2p0A2p0F2p0D2p0V1p0

C++ compilers will take this string as -march argument.
https://llvm.org/docs/RISCVUsage.html
https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Options.html

jdemel · 2023-11-05T12:58:38Z

The GCC RISC-V architecture options imply that we could optimize for any given CPU. That'd be to many machines to compile, I assume.
The GCC argument would potentially be:

gcc -march=rv64i2p0m2p0a2p0f2p0d2p0v1p0

While I assume the clang argument is:

clang -march=rv64i2p0m2p0a2p0f2p0d2p0v1p0

I was hoping for smth like

gcc -march=rv64imafdv

I read that there are some names for a common set of extensions. This might be interesting as a baseline as well.

drom · 2023-11-09T03:20:42Z

You are right. The are some baselines like rv64g
also there are profiles https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc
but for this project, I think important baseline is:

rv64i - 64-bit
f - float32
d - float64
v - vector (due to complications of Vector extension story I would explicitly say v1p0)
m,a,c - (optional) most cores with come with Integer mul/div; atomics; compact instructions

drom · 2023-11-09T03:24:37Z

Another related question is: Are we using ASM or Intrinsics ?
Recently I switched to RVV intrinsics https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc
They are very handy and quite stable at this point.

jdemel · 2023-11-09T08:59:18Z

I suggest to use intrinsics. So far, we mostly use intrinsics and only rarely use ASM. Also, we tend to replace the ASM code with intrinsics when they're available.

If a platform comes with Vector extensions, is it reasonable to assume it comes with m,a,c too? If we decide to make this optional, we will eventually see situations where we have to work around this assumption. I'd like to keep things simple.

Would you be willing to set up an initial environment? I'd envision 2 options:

a generic machine for riscv64 that includes the sifive74 kernels.
a rv64imafdv1p0 machine

Currently, we use: https://github.com/uraimo/run-on-arch-action#supported-platforms for non-x86 platforms. If we can integrate a CI test for these machines, that'd be worthwhile.

As soon as we set up the initial machines and CI, we can start to add optimized kernels. It'd be interesting to see how well a compiler optimizes code compared to hand-optimized kernels.

drom · 2023-11-10T01:10:15Z

I can setup CI for:

HiFive Unmatched dev-board: https://www.sifive.com/boards/hifive-unmatched-revb
u74 core. rv64imafdc
Good in-order superscalar baseline. I have physical dev. board
https://www.sifive.com/cores/intelligence-x280
Popular vector + in-order superscalar target.
x280: rv64imafdcv1p0
We can run on QEMU (What version of QEMU you use?)

I can setup cycle-accurate model benchmarking ~100KHz simulation speed.

drmpeg · 2024-06-25T01:15:29Z

Just an update. A board with a RVV 1.0 capable CPU has been released. The Banana Pi BPI-F3.

https://docs.banana-pi.org/en/BPI-F3/BananaPi_BPI-F3

Current boards only have 4GB of RAM, so it may be better to wait for 8 or 16GB models to come out.

michael-roe · 2024-10-06T10:57:13Z

Ok, various other peo0ple have expressed an interest in using RISC-V vector extensions with volk.

I can amend this PR, but …. What doyou guys really want it to do?

Getting cpu_features to just check for “v” is easy - I already upstreamed a patch to cpu_features to do that. Checking the parameters of the vector extension (size of registers etc) might be harder.

jdemel · 2024-10-09T07:16:50Z

@michael-roe multiple things.

create a machine that enables Risc-V compilation with vector extensions. (and a version without)
Be able to dynamically load the correct machine. i.e. detect the correct instructions at runtime
Adding kernels

In most cases, we rely on compiler switches to produce the best optimized version for a specific set of vector extensions. e.g. for SSE4, we compile a machine with -msse4 etc.

drmpeg · 2024-10-09T07:23:20Z

Here's a good article on RVV intrinsics.

https://fprox.substack.com/p/risc-v-vector-programming-in-c-with

The example code could be the basis for the first kernel.

/** vector addition
 *
 * @param dst address of destination array
 * @param lhs address of left hand side operand array
 * @param rhs address of right hand side operand array
 * @param avl application vector length (array size)
 */
void vector_add(float *dst,
                float *lhs,
                float *rhs,
                size_t avl)
{
    for (size_t vl; avl > 0; avl -= vl, lhs += vl, rhs += vl, dst += vl)
    {
        // compute the number of elements which are going to be
        // processed in this iteration of loop body.
        // this number corresponds to the vector length (vl)
        // and is evaluated from avl (application vector length)
        vl = __riscv_vsetvl_e32m1(avl);
        // loading operands
        vfloat32m1_t vec_src_lhs = __riscv_vle32_v_f32m1(lhs, vl);
        vfloat32m1_t vec_src_rhs = __riscv_vle32_v_f32m1(rhs, vl);
        // actual vector addition
        vfloat32m1_t vec_acc = __riscv_vfadd_vv_f32m1(vec_src_lhs,
                                                      vec_src_rhs,
                                                      vl);
        // storing results
        __riscv_vse32_v_f32m1(dst, vec_acc, vl);
    }
}

michael-roe · 2024-10-09T09:04:04Z

For example, what would you like the (volk) machines to be called?

Which features does the vector machine require? 64 bit? Check for “v” in the option string to detect the presence of vector instructions. (Rather than something more complex involving e.g. finding out what size the vector registers are).

you probably already know that, but typically the Riscv option string is in the Flattened Device Tree (FDT), which gets parsed by the kernel, which then gets queried from user space by cpu_features.

drmpeg · 2024-10-09T13:04:13Z

Seems like everyone is calling it "rvv".

You'll probably want to check the compiler version (for gcc, 13 or later).

hwprobe has been added for Linux 6.4. Here's the example code.

#include <asm/hwprobe.h>
#include <asm/unistd.h>
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	struct riscv_hwprobe requests[] = {{RISCV_HWPROBE_KEY_MVENDORID},
					   {RISCV_HWPROBE_KEY_MARCHID},
					   {RISCV_HWPROBE_KEY_MIMPID},
					   {RISCV_HWPROBE_KEY_CPUPERF_0},
					   {RISCV_HWPROBE_KEY_IMA_EXT_0}};

	int ret = syscall(__NR_riscv_hwprobe, &requests,
			  sizeof(requests) / sizeof(struct riscv_hwprobe), 0,
			  NULL, 0);
	if (ret) {
		fprintf(stderr, "Syscall failed with %d: %s\n", ret,
 			strerror(errno));
		return 1;
	}

	bool has_misaligned_fast =
	    (requests[3].value & RISCV_HWPROBE_MISALIGNED_FAST) ==
	    RISCV_HWPROBE_MISALIGNED_FAST;
	printf("Vendor ID: %llx\n", requests[0].value);
	printf("MARCH ID: %llx\n", requests[1].value);
	printf("MIMPL ID: %llx\n", requests[2].value);
	printf("HasMisalignedFast: %s\n", has_misaligned_fast ? "yes" : "no");
	__u64 extensions = requests[4].value;
	printf("Extensions:\n");
	if (extensions & RISCV_HWPROBE_IMA_FD)
		printf("\tFD\n");
	if (extensions & RISCV_HWPROBE_IMA_C)
		printf("\tC\n");
	if (extensions & RISCV_HWPROBE_IMA_V)
		printf("\tV\n");
	if (extensions & RISCV_HWPROBE_EXT_ZBA)
		printf("\tZBA\n");
	if (extensions & RISCV_HWPROBE_EXT_ZBB)
		printf("\tZBB\n");
	if (extensions & RISCV_HWPROBE_EXT_ZBS)
		printf("\tZBS\n");

	return 0;
}

On HiFive Unmatched:

ubuntu@riscv64:~/xfer$ ./hwprobe
Vendor ID: 489
MARCH ID: 8000000000000007
MIMPL ID: 20181004
HasMisalignedFast: no
Extensions:
	FD
	C
ubuntu@riscv64:~/xfer$

On VisionFive 2:

ubuntu@visionfive:~/xfer$ ./hwprobe
Vendor ID: 489
MARCH ID: 8000000000000007
MIMPL ID: 4210427
HasMisalignedFast: no
Extensions:
	FD
	C
	ZBA
	ZBB
ubuntu@visionfive:~/xfer$

I'd think kernels could be vector size agnostic.

michael-roe · 2024-10-09T15:26:51Z

We could call the machine something like rva22u64v (ie rva2264u profile, with the optional vector extension).

semantics would be as per the riscv profile, so it would also include the bit manipulation instructions.
I might have to add support for detecting these to cpu_features, if no-one else has already done it.

michael-roe · 2024-10-09T15:44:38Z

Also, because the rva22u64 profile says to implement all of V rather than some subset of you’re going to implement it at all, that gives us some justification for just checking for “v” rather than some subset of the vector Isa.

michael-roe · 2024-10-09T16:00:01Z

It looks like if I turn my back for a minute those wacky riscv guys will have added a bunch of new instructions…

in particular, we now have the Zfa extension, which on a 32 bit platform includes fmvh.x.d, which lets you get at the upper 32 bits of a double precision floating point register. I’ve got to admit, I really wanted this instruction. (MIPS had the equivalent instruction).

jdemel · 2024-10-10T18:24:54Z

If there's a certain set of instructions that are required to be implemented as a whole, I'd summarize this with 1 abbreviation. If this is rvv0, rvv1, rva22u64v, or smth else is up to people who implement it and explain why. I'd like to adopt an easy to recognize scheme that's used everywhere in this space though. It makes it easier to point people to the code and say: "yes, we have this as well" (or if anyone would search for it). If cpu_features adopted a certain name already, this might be good for us too.

A side note: I doubt that we need support for RiscV for anything but Linux at the moment (and maybe, maybe *BSD at some point).

michael-roe · 2024-10-27T15:16:33Z

There’s a recent PR (#774 I think) that will probably make this one obsolete.

I’ll close this one when/it we’ve merged a change that supercedes it.

michael-roe · 2024-11-07T13:17:02Z

Obsoleted by #774

Add an arch for RISCV with vector extension

effeab5

Signed-off-by: Michael Roe <[email protected]>

drmpeg mentioned this pull request Oct 10, 2024

RISC-V Vector 1.0 Support: If and where to start? #772

Closed

michael-roe closed this Nov 7, 2024

Add an arch for RISCV with vector extension #625

Add an arch for RISCV with vector extension #625

Uh oh!

Conversation

michael-roe commented May 17, 2023

Uh oh!

drom commented Jul 13, 2023

Uh oh!

jdemel commented Jul 13, 2023

Uh oh!

drom commented Jul 13, 2023

Uh oh!

jdemel commented Nov 4, 2023

Uh oh!

drom commented Nov 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdemel commented Nov 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drom commented Nov 9, 2023

Uh oh!

drom commented Nov 9, 2023

Uh oh!

jdemel commented Nov 9, 2023

Uh oh!

drom commented Nov 10, 2023

Uh oh!

drmpeg commented Jun 25, 2024

Uh oh!

michael-roe commented Oct 6, 2024

Uh oh!

jdemel commented Oct 9, 2024

Uh oh!

drmpeg commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-roe commented Oct 9, 2024

Uh oh!

drmpeg commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-roe commented Oct 9, 2024

Uh oh!

michael-roe commented Oct 9, 2024

Uh oh!

michael-roe commented Oct 9, 2024

Uh oh!

jdemel commented Oct 10, 2024

Uh oh!

michael-roe commented Oct 27, 2024

Uh oh!

michael-roe commented Nov 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

drom commented Nov 4, 2023 •

edited

Loading

jdemel commented Nov 5, 2023 •

edited

Loading

drmpeg commented Oct 9, 2024 •

edited

Loading

drmpeg commented Oct 9, 2024 •

edited

Loading