-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Introduce the pylibcudf API and subpackage #13921
Comments
Contributes to #13921 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #14972
Contributes to #13921 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Ashwin Srinath (https://github.com/shwina) URL: #14970
Contributes to #13921 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #14982
Contributes to #13921 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) URL: #15005
Before we can close this issue we will also need to add comprehensive testing. For now, every pylibcudf API is being developed as a back-end for a cuDF Python API, so the existing Python test suite gives us sufficient coverage. We will want to come back and remedy this gap before actually extracting pylibcudf as a separate package. |
…action in pylibcudf (#15011) Contributes to #13921 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #15011
pylibcudf development is now under way. I have created a project board for tracking as well as a number of issues to discuss more specific topics. I am therefore now closing this issue in favor of tracking using those. |
FYI #15162 is the spiritual successor to this issue |
Background
cuDF-python is the common way for Python users to interact with libcudf, the CUDA/C++ computational core for RAPIDS dataframe and database operations. However, cuDF-python is designed to closely correspond with the pandas API, and in the process incurs some semantic overhead to libcudf algorithms. For python applications looking for accelerated dataframe operations and where API matching is not useful, "pylibcudf" provides a direct way for python ecosystem to use libcudf. Pylibcudf also makes libcudf APIs available to the python ecosystem even if they are not supported in pandas (e.g. TEXT).
In addition, we can improve the performance and design of cuDF-python by building on a "pylibcudf" foundation and refactoring extra complexity in cuDF-python's Cython layer.
Example performance
Here is an example of the API design we have in mind for pylibcudf.
Here are some draft performance results from our prototype, showing good throughput and low overhead.
The text was updated successfully, but these errors were encountered: