Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: evaluate if cuDF can be used with datafusion-python #936

Open
timsaucer opened this issue Oct 28, 2024 · 3 comments
Open

Spike: evaluate if cuDF can be used with datafusion-python #936

timsaucer opened this issue Oct 28, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As other DataFrame libraries start moving to leveraging GPU resources, it would be useful to see if we could leverage the work already done in pandas and polars for interoperating with cuDF to give a similar experience in DataFusion.

Describe the solution you'd like

Evaluate the level of effort and technical limitations to using cuDF to evaluate DataFrames. Also worth evaluating is their c++ interface which we could potentially bring in to DataFusion upstream if we are willing to write the appropriate wrappers.

Describe alternatives you've considered

Leave as is.

Additional context

This task is really just focused on researching what would be required and if there is an opportunity here.

@timsaucer timsaucer added the enhancement New feature or request label Oct 28, 2024
@andygrove
Copy link
Member

andygrove commented Oct 28, 2024

I have some experience in this area. While at NVIDIA, I created a POC with Rust bindings around cuDF and then provided interoperability with arrow-rs. Unfortunately, that code was internal and not open-source. I used cxx to create the bindings.

This repo (datafusion-python) once contained a prototype of translating DataFusion logical plan to cuDF operations (all in Python). It is still there in the history somewhere.

@andygrove
Copy link
Member

andygrove commented Oct 28, 2024

I see that there is now one RAPIDS library that provides Rust bindings: https://docs.rapids.ai/api/cuvs/nightly/rust_api/ so it may be interesting to see what approach they took to wrap C++ in this case.

edit: cuvs is using bindgen

@drauschenbach
Copy link
Contributor

This repo (datafusion-python) once contained a prototype of translating DataFusion logical plan to cuDF operations (all in Python). It is still there in the history somewhere.

Possibly #602.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants