-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plans for RISC-V Vector Extension (RVV) optimization? #11063
Comments
Hi, thanks for your interest. May I ask you what's your interest in GDAL and/or RISC-V ? Perhaps you're affiliated with a RISC-V founder or some group that promotes for its adoption?
I would be much more supportive of RISC-V optimizations going through the use of an abstraction software layer. I see that libjxl uses https://github.com/google/highway and that it has RISC-V support. That would also enable us to cover other platforms like NEON / ARM64. Currently we have a few specific SSE/SSE2/AVX2 code paths using Intel intrinsics, either directly, or through a thin abstraction layer such as gcore/gdalsse_priv.h. I'm undecided if adopting highway would totally deprecate those code paths, or if we would keep them. It all depends if we can reach the same level of performance, and also how we deal with the external dependency. The main candidates for accelerated code paths are alg/gdalwarpkernel.cpp, gcore/overview.cpp and CopyWord related code of gcore/rasterio.cpp |
Hi @rouault, Thank you for the detailed response! Let me introduce myself first—I’m Yin Zhang (张尹), from the Programming Language and Compilation Technology Lab (PLCT Lab) at the Intelligent Software Research Center, Institute of Software, Chinese Academy of Sciences. We are members of the RISC-V Foundation and actively involved in promoting its development. Additionally, we have some non-public projects that would benefit from using GDAL on the RISC-V platform, where performance is a key concern. I personally have experience in various SIMD and vector-related optimizations, including RISC-V vector optimizations for OpenCV (https://github.com/opencv/opencv/commits/4.x/?author=joy2myself). I’m also working on the implementation of the Alternatively, we could discuss potential frameworks for upstream optimizations in GDAL. Based on my experience in the SIMD field, I see three primary approaches for SIMD optimizations in most foundational libraries:
Each approach has its pros and cons, and the choice often depends on the specific needs and practical circumstances of the project. Of course, you are far more familiar with the specific requirements and real-world conditions of GDAL than I am. Looking forward to hearing your thoughts! Best, |
My own inclination would go to that. Whether which approach is the preferred one would be to be determined. Is experimental/simd a sort of staging area for evolutions of the C++ standard/library. What is the status of this? The GDAL project is rather conservative and I don't think we would want to adopt a C++ feature that hasn't been officially adopted and has at least one implementation. Perhaps the topic is not mature enough yet to be considered for GDAL too. Platform-specific code would fall for me in the https://gdal.org/en/latest/development/rfc/rfc85_policy_code_additions.html category. The GDAL project has unfortunately seen a lot of contributors over time "dump" their code to upstream and run away afterwards, leading to even more works for maintainers. Any choice should probably go through the RFC route: https://gdal.org/en/latest/development/rfc/index.html
I had initiated a very primitive sort of that with gcore/gdalsse_priv.h, but this is more as a convenient way of using SSE intrinsincs with C++ than intended to be cross architecture abstraction layer. Other libs such as Highway, xsimd, etc. have likely done a much better job at this. |
Hi @rouault, Regarding the status of I fully understand the upstream position regarding platform-specific code. After internal discussions with my team, we will carefully evaluate and determine our plan. There seem to be two possible directions at this point:
Thank you again for your detailed and thoughtful response. It has been very helpful in shaping our direction. |
FYI, in #11202 , I've used the sse2neon.h header that works very well. Not sure if there's a similar sse2rvv.h ;-) |
There is: sse2rvv and neon2rvv But I wouldn't recommend using them for more than a quick initial port, because they don't allow you to take advantage of the full vector length. You'd be better of using something like highway or potentially std::simd, which allow you to write vector length agnostic generic SIMD. From what I've seen of the codebase, I would recommend successively adding custom RVV codepaths, because the SIMD usage seems to be mostly in isolated places.
Some RVV 1.0 hardware is already available, see "Processors with RVV 1.0": https://camel-cdr.github.io/rvv-bench-results/index.html You can just use qemu in the Github CI. That's even better than real hardware, because you can configure it to use different vector length and adjust some other implementation details.
Yeah, that could happen if you don't have capacity to maintain it. Hopefully problems would get caught if tests are run by the CI. See for example the RVV support that now is in gnuradio/volk for an example CI setup. |
I don't know RVV specifics, but for Intel, for SSE2 vs AVX2, in the few times I've compared in GDAL, the AVX2 boost is far from being twice the SSE2 one. For example in gcore/statistics.txt, I mention that the boost of AVX2 vs SSE2 is just 15%. But yes if you have some abstraction of the vector length, you can get that "for free".
Did you identify specific places where that would be beneficial ? The measured runtime speed enhancement vs implementation & maintenance cost ratio to assess case by case. |
The difference for RVV should be larger, because x86 CPUs try to make SSE still fast, because of the legacy code, while RVV implementations tend to not specifically optimize below their vector length.
No I don't, because I didn't know about this project before I found this issue. I just wanted to suggest how I'd approach adding RVV optimizations. |
Feature description
First off, thanks for all the amazing work on GDAL! I wanted to ask if there are any plans to optimize GDAL for the RISC-V platform, specifically using the RISC-V Vector Extension (RVV). With RISC-V gaining popularity, having RVV optimizations could potentially bring performance benefits to GDAL on that platform.
If there’s no plan yet, would this be something you’d consider? My team and I would be interested in contributing if there’s a need for testing or development in this area.
Thanks!
Additional context
No response
The text was updated successfully, but these errors were encountered: