Add Initial Java Support for GDS to KvikIO#396
Add Initial Java Support for GDS to KvikIO#396rapids-bot[bot] merged 42 commits intorapidsai:branch-25.02from
Conversation
|
/ok to test |
There was a problem hiding this comment.
A few high level comments before digging too deep into this code:
- I have a weak preference to wrap the C++ bindings rather than re-implement kvikio in Java. This will probably be easier than reimplementing the code as new features are introduced to kvikio, and gives more confidence in testing (bindings are probably less bug-prone than a full reimplementation).
- This will need to pass style checks. See the cuDF guide on Code Formatting. The same guidance applies to using
pre-commitwith kvikio development. - This will need a build system, probably Maven. Long commands in the README that explain how to supply all the dependencies are not a good solution if those dependencies can be fetched and built automatically. See how cuDF's Java bindings use Maven, for instance. https://github.com/rapidsai/cudf/tree/branch-24.08/java
- The end result should be that commands like
mvn clean installwork properly.
- The end result should be that commands like
- This will need CI builds and tests. Review the following resources on cuDF's Java CI and try to duplicate them:
- GitHub Action Job for PRs
- GitHub Action Job for Nightly Tests
- Java build/test script
- This uses conda to supply the libcudf package, and builds the Java bindings from that. If you use direct bindings and do not depend on the kvikio C++ library, then you can do something simpler that only requires installing cuFile from conda.
- Java dependencies.yaml file key
- This file key gives a list of what dependency lists to include. This file key is referenced by the CI build/test script. This file key should include the dependency list
cuda, so you get the cufile package. See the rapids-dependency-file-generator README for more information.
- This file key gives a list of what dependency lists to include. This file key is referenced by the CI build/test script. This file key should include the dependency list
- Java dependencies.yaml list
- This dependency list should probably include
mavenandopenjdk. Not sure what else.
- This dependency list should probably include
Give this a start and please feel free to schedule time with me to discuss anything that is unclear!
|
Is this still planned for 24.08 or should we move to 24.10? |
|
I will not likely finish the updates here for a couple weeks due to other priorities, so this should be moved to 24.10 |
|
Ok thanks Alex! 🙏 Have moved to 24.10 |
bdice
left a comment
There was a problem hiding this comment.
Quick first pass of review -- mostly focused on packaging and CI. Give my suggestions a try and I'll comment /ok to test to run your next commits on CI.
|
/ok to test |
|
/ok to test |
|
/ok to test |
madsbk
left a comment
There was a problem hiding this comment.
Looks good, thanks @aslobodaNV
|
@madsbk Do you want to target 24.12 or 25.02? |
|
@aslobodaNV Great work. I left some comments but we should be able to get this merged after they are addressed. |
I have no strong opinion but it might make sense to add some more features before we release? |
I'm fine with this being held in the nightly channel till 25.02. I expect I will have some more time next quarter to work on some improvements here that would be good before the full release. These last two quarters have not given me much time to add the nice-to-haves. |
@bdice Addressed all comments except the rapids url change. Please confirm that it should differ from the one used in the cudf repo. |
Co-authored-by: Bradley Dice <[email protected]>
bdice
left a comment
There was a problem hiding this comment.
Approving and retargeting to 25.02.
|
One last CI run, then this should be good to go. |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/merge |
This PR is intended to add initial support for Java binding to GDS as part of the KvikIO library. In this PR are the minimal set of bindings required to support synchronous read and write IO operations via GDS as well as a single example to demonstrate how the bindings can be used alongside other CUDA libraries, such as JCuda. Full support for the GDS CuFile API, including batch and asynchronous IO, has not yet been implemented and more sophisticated error/exception handling is not yet in place. There is a README located within kvikio/java detailing how this new functionality can be compiled and built locally, along with detailed instructions on how to run the included usage example.