Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an arch for RISCV with vector extension #625

Closed
wants to merge 1 commit into from

Conversation

michael-roe
Copy link
Contributor

No description provided.

@drom
Copy link

drom commented Jul 13, 2023

It would be cool porting Volk on RVV
I want to be part of it.

@jdemel
Copy link
Contributor

jdemel commented Jul 13, 2023

I'd be in favor of adding RiscV support for VOLK. Also, we already run QA tests for RiscV and we already have some hand optimized assembly for this ISA. Though, no vector support for it.
As far as I know it is very difficult to get RiscV systems with vector extensions.

This PR in particular is difficult though because riscv_vector ends up in the list of x86 extensions. Also, I thought this PR was closed in favor of a later one.

@drom
Copy link

drom commented Jul 13, 2023

RISC-V Vector v1.0 (RVV1) looks like perfect fit for Volk.
We will see ICs with RVV1 in the near future. For now we can use simulator.
I can help with cycle accurate profiling on RVV cores that SiFive provides.

@jdemel
Copy link
Contributor

jdemel commented Nov 4, 2023

Is riscv_vector a good name for this architecture? The RiscV naming system is getting a bit confusing.

@drom
Copy link

drom commented Nov 4, 2023

RISC-V has naming convention (ISA-string) described in the spec:
https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
Chapter 22. Page 121

I think the baseline Vector ISA for this project should be RV64I2p0M2p0A2p0F2p0D2p0V1p0
can be checked here: https://rv.drom.io/?RV64I2p0M2p0A2p0F2p0D2p0V1p0

C++ compilers will take this string as -march argument.
https://llvm.org/docs/RISCVUsage.html
https://gcc.gnu.org/onlinedocs/gcc/RISC-V-Options.html

@jdemel
Copy link
Contributor

jdemel commented Nov 5, 2023

The GCC RISC-V architecture options imply that we could optimize for any given CPU. That'd be to many machines to compile, I assume.
The GCC argument would potentially be:

gcc -march=rv64i2p0m2p0a2p0f2p0d2p0v1p0

While I assume the clang argument is:

clang -march=rv64i2p0m2p0a2p0f2p0d2p0v1p0

I was hoping for smth like

gcc -march=rv64imafdv

I read that there are some names for a common set of extensions. This might be interesting as a baseline as well.

@drom
Copy link

drom commented Nov 9, 2023

You are right. The are some baselines like rv64g
also there are profiles https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc
but for this project, I think important baseline is:

  • rv64i - 64-bit
  • f - float32
  • d - float64
  • v - vector (due to complications of Vector extension story I would explicitly say v1p0)
  • m,a,c - (optional) most cores with come with Integer mul/div; atomics; compact instructions

@drom
Copy link

drom commented Nov 9, 2023

Another related question is: Are we using ASM or Intrinsics ?
Recently I switched to RVV intrinsics https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc
They are very handy and quite stable at this point.

@jdemel
Copy link
Contributor

jdemel commented Nov 9, 2023

I suggest to use intrinsics. So far, we mostly use intrinsics and only rarely use ASM. Also, we tend to replace the ASM code with intrinsics when they're available.

If a platform comes with Vector extensions, is it reasonable to assume it comes with m,a,c too? If we decide to make this optional, we will eventually see situations where we have to work around this assumption. I'd like to keep things simple.

Would you be willing to set up an initial environment? I'd envision 2 options:

  1. a generic machine for riscv64 that includes the sifive74 kernels.
  2. a rv64imafdv1p0 machine

Currently, we use: https://github.com/uraimo/run-on-arch-action#supported-platforms for non-x86 platforms. If we can integrate a CI test for these machines, that'd be worthwhile.

As soon as we set up the initial machines and CI, we can start to add optimized kernels. It'd be interesting to see how well a compiler optimizes code compared to hand-optimized kernels.

@drom
Copy link

drom commented Nov 10, 2023

I can setup CI for:

  1. HiFive Unmatched dev-board: https://www.sifive.com/boards/hifive-unmatched-revb
    u74 core. rv64imafdc
    Good in-order superscalar baseline. I have physical dev. board

  2. https://www.sifive.com/cores/intelligence-x280
    Popular vector + in-order superscalar target.
    x280: rv64imafdcv1p0
    We can run on QEMU (What version of QEMU you use?)

I can setup cycle-accurate model benchmarking ~100KHz simulation speed.

@drmpeg
Copy link
Member

drmpeg commented Jun 25, 2024

Just an update. A board with a RVV 1.0 capable CPU has been released. The Banana Pi BPI-F3.

https://docs.banana-pi.org/en/BPI-F3/BananaPi_BPI-F3

Current boards only have 4GB of RAM, so it may be better to wait for 8 or 16GB models to come out.

@michael-roe
Copy link
Contributor Author

Ok, various other peo0ple have expressed an interest in using RISC-V vector extensions with volk.

I can amend this PR, but …. What doyou guys really want it to do?

Getting cpu_features to just check for “v” is easy - I already upstreamed a patch to cpu_features to do that. Checking the parameters of the vector extension (size of registers etc) might be harder.

@jdemel
Copy link
Contributor

jdemel commented Oct 9, 2024

@michael-roe multiple things.

  1. create a machine that enables Risc-V compilation with vector extensions. (and a version without)
  2. Be able to dynamically load the correct machine. i.e. detect the correct instructions at runtime
  3. Adding kernels

In most cases, we rely on compiler switches to produce the best optimized version for a specific set of vector extensions. e.g. for SSE4, we compile a machine with -msse4 etc.

@drmpeg
Copy link
Member

drmpeg commented Oct 9, 2024

Here's a good article on RVV intrinsics.

https://fprox.substack.com/p/risc-v-vector-programming-in-c-with

The example code could be the basis for the first kernel.

/** vector addition
 *
 * @param dst address of destination array
 * @param lhs address of left hand side operand array
 * @param rhs address of right hand side operand array
 * @param avl application vector length (array size)
 */
void vector_add(float *dst,
                float *lhs,
                float *rhs,
                size_t avl)
{
    for (size_t vl; avl > 0; avl -= vl, lhs += vl, rhs += vl, dst += vl)
    {
        // compute the number of elements which are going to be
        // processed in this iteration of loop body.
        // this number corresponds to the vector length (vl)
        // and is evaluated from avl (application vector length)
        vl = __riscv_vsetvl_e32m1(avl);
        // loading operands
        vfloat32m1_t vec_src_lhs = __riscv_vle32_v_f32m1(lhs, vl);
        vfloat32m1_t vec_src_rhs = __riscv_vle32_v_f32m1(rhs, vl);
        // actual vector addition
        vfloat32m1_t vec_acc = __riscv_vfadd_vv_f32m1(vec_src_lhs,
                                                      vec_src_rhs,
                                                      vl);
        // storing results
        __riscv_vse32_v_f32m1(dst, vec_acc, vl);
    }
}

@michael-roe
Copy link
Contributor Author

For example, what would you like the (volk) machines to be called?

Which features does the vector machine require? 64 bit? Check for “v” in the option string to detect the presence of vector instructions. (Rather than something more complex involving e.g. finding out what size the vector registers are).

you probably already know that, but typically the Riscv option string is in the Flattened Device Tree (FDT), which gets parsed by the kernel, which then gets queried from user space by cpu_features.

@drmpeg
Copy link
Member

drmpeg commented Oct 9, 2024

Seems like everyone is calling it "rvv".

You'll probably want to check the compiler version (for gcc, 13 or later).

hwprobe has been added for Linux 6.4. Here's the example code.

#include <asm/hwprobe.h>
#include <asm/unistd.h>
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
	struct riscv_hwprobe requests[] = {{RISCV_HWPROBE_KEY_MVENDORID},
					   {RISCV_HWPROBE_KEY_MARCHID},
					   {RISCV_HWPROBE_KEY_MIMPID},
					   {RISCV_HWPROBE_KEY_CPUPERF_0},
					   {RISCV_HWPROBE_KEY_IMA_EXT_0}};

	int ret = syscall(__NR_riscv_hwprobe, &requests,
			  sizeof(requests) / sizeof(struct riscv_hwprobe), 0,
			  NULL, 0);
	if (ret) {
		fprintf(stderr, "Syscall failed with %d: %s\n", ret,
 			strerror(errno));
		return 1;
	}

	bool has_misaligned_fast =
	    (requests[3].value & RISCV_HWPROBE_MISALIGNED_FAST) ==
	    RISCV_HWPROBE_MISALIGNED_FAST;
	printf("Vendor ID: %llx\n", requests[0].value);
	printf("MARCH ID: %llx\n", requests[1].value);
	printf("MIMPL ID: %llx\n", requests[2].value);
	printf("HasMisalignedFast: %s\n", has_misaligned_fast ? "yes" : "no");
	__u64 extensions = requests[4].value;
	printf("Extensions:\n");
	if (extensions & RISCV_HWPROBE_IMA_FD)
		printf("\tFD\n");
	if (extensions & RISCV_HWPROBE_IMA_C)
		printf("\tC\n");
	if (extensions & RISCV_HWPROBE_IMA_V)
		printf("\tV\n");
	if (extensions & RISCV_HWPROBE_EXT_ZBA)
		printf("\tZBA\n");
	if (extensions & RISCV_HWPROBE_EXT_ZBB)
		printf("\tZBB\n");
	if (extensions & RISCV_HWPROBE_EXT_ZBS)
		printf("\tZBS\n");

	return 0;
}

On HiFive Unmatched:

ubuntu@riscv64:~/xfer$ ./hwprobe
Vendor ID: 489
MARCH ID: 8000000000000007
MIMPL ID: 20181004
HasMisalignedFast: no
Extensions:
	FD
	C
ubuntu@riscv64:~/xfer$

On VisionFive 2:

ubuntu@visionfive:~/xfer$ ./hwprobe
Vendor ID: 489
MARCH ID: 8000000000000007
MIMPL ID: 4210427
HasMisalignedFast: no
Extensions:
	FD
	C
	ZBA
	ZBB
ubuntu@visionfive:~/xfer$

I'd think kernels could be vector size agnostic.

@michael-roe
Copy link
Contributor Author

We could call the machine something like rva22u64v (ie rva2264u profile, with the optional vector extension).

semantics would be as per the riscv profile, so it would also include the bit manipulation instructions.
I might have to add support for detecting these to cpu_features, if no-one else has already done it.

@michael-roe
Copy link
Contributor Author

Also, because the rva22u64 profile says to implement all of V rather than some subset of you’re going to implement it at all, that gives us some justification for just checking for “v” rather than some subset of the vector Isa.

@michael-roe
Copy link
Contributor Author

It looks like if I turn my back for a minute those wacky riscv guys will have added a bunch of new instructions…

in particular, we now have the Zfa extension, which on a 32 bit platform includes fmvh.x.d, which lets you get at the upper 32 bits of a double precision floating point register. I’ve got to admit, I really wanted this instruction. (MIPS had the equivalent instruction).

@jdemel
Copy link
Contributor

jdemel commented Oct 10, 2024

If there's a certain set of instructions that are required to be implemented as a whole, I'd summarize this with 1 abbreviation. If this is rvv0, rvv1, rva22u64v, or smth else is up to people who implement it and explain why. I'd like to adopt an easy to recognize scheme that's used everywhere in this space though. It makes it easier to point people to the code and say: "yes, we have this as well" (or if anyone would search for it). If cpu_features adopted a certain name already, this might be good for us too.

A side note: I doubt that we need support for RiscV for anything but Linux at the moment (and maybe, maybe *BSD at some point).

@michael-roe
Copy link
Contributor Author

There’s a recent PR (#774 I think) that will probably make this one obsolete.

I’ll close this one when/it we’ve merged a change that supercedes it.

@michael-roe michael-roe closed this Nov 7, 2024
@michael-roe
Copy link
Contributor Author

Obsoleted by #774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants