[DOC] Improving Clarity and Consistency in the set_row_groups
Doc in libcudf
#17772
Labels
doc
Documentation
set_row_groups
Doc in libcudf
#17772
Report incorrect documentation
Location of incorrect documentation
cudf/cpp/include/cudf/io/parquet.hpp
Lines 214 to 219 in 9cd5253
https://docs.rapids.ai/api/cudf/legacy/libcudf_docs/api_docs/io_readers/#_CPPv4N4cudf2io22parquet_reader_options14set_row_groupsENSt6vectorINSt6vectorI9size_typeEEEE
Describe the problems or issues found in the documentation
The
set_row_groups
function with the parameterstd::vector<std::vector<size_type>> row_groups
is quite confusing at first glance. After some experimentation, I realized that eachstd::vector<size_type>
corresponds to a single input source, making the outerstd::vector<std::vector<size_type>>
represent multiple input sources.Unfortunately, this is not clear from the documentation alone. In comparison, the Python API documentation posted below for the equivalent parameter is much more intuitive and easier to understand.
Additionally, the
set_columns
function only accepts a singlestd::vector
, making its interface inconsistent withset_row_groups
. This inconsistency further adds to the confusion. Improved documentation and a more consistent API design would greatly enhance usability.Steps taken to verify documentation is incorrect
In cudf Python API: https://docs.rapids.ai/api/cudf/legacy/user_guide/api_docs/api/cudf.read_parquet/#cudf.read_parquet
it is much better understandable:
Suggested fix for documentation
Should be the same as the cudf Python doc
The text was updated successfully, but these errors were encountered: