Skip to content
This repository has been archived by the owner on Dec 21, 2018. It is now read-only.

[WIP] Apache Parquet reader #85

Closed
wants to merge 91 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
6cb51df
[parquet-reader] Add parquet reader wrapper
gcca Jul 17, 2018
bbe9467
[parquet-reader] Add column reader
gcca Jul 18, 2018
6ced85b
[parquet-reader] Enable read new page call
gcca Jul 20, 2018
16b40cb
WIP: add custom decoder
aocsa Jul 20, 2018
fc57ccb
[parquet-reader] Update parquet API to v1.3.1
gcca Jul 23, 2018
3000f89
[parquet-reader] Read batch as gdf column
gcca Jul 25, 2018
a6e7d0e
arrow decoder
aocsa Jul 26, 2018
7c24364
merge with parquet-reader
aocsa Jul 26, 2018
3b9af0e
Merge branch 'parquet-reader' into parquet-decoder
aocsa Jul 26, 2018
4593968
[parquet-reader] Add gdf column read test
gcca Jul 26, 2018
abe73d3
[parquet-reader] Add file reader by columns benchmark
gcca Jul 27, 2018
a384b15
decoder using host
aocsa Jul 27, 2018
79470ea
decoder using gpu
aocsa Jul 27, 2018
3ef6ecd
[parquet-reader] Read spaced batches to gdf column
gcca Jul 30, 2018
4282650
Merge branch 'parquet-reader' into parquet-decoder
aocsa Aug 1, 2018
819af4e
use specific gpu-decoder for int32
aocsa Aug 1, 2018
5713017
[parquet-reader] Add API to read a parquet file
gcca Aug 2, 2018
7ad9972
[parquet-reader] Merge from parquet-decoder
gcca Aug 2, 2018
882a296
[parquet-reader] Fix template definitions for readers
gcca Aug 2, 2018
e8068eb
[parquet-reader] Merger from LibGDF/master
gcca Aug 2, 2018
e407912
[parquet-reader] Fix testing files
gcca Aug 2, 2018
9ba5d7e
[parquet-reader] Move tests to src
gcca Aug 2, 2018
6aaaa51
[parquet-reader] Fix access to parquetcpp repository
gcca Aug 2, 2018
13e27c7
[parquet-reader] Fix benchmark test building
gcca Aug 2, 2018
15ff796
[parquet-reader] Fix build moving tests into src
gcca Aug 2, 2018
d7bed6a
[parquet-reader] Update tests building process
gcca Aug 2, 2018
92d89e9
[parquet-reader] Add conda dependencies for Thrift
gcca Aug 3, 2018
f56a978
[parquet-reader] Check gdf dtype from parquet type
gcca Aug 6, 2018
9043c7a
[parquet-reader] Apply batch spaced reading on tests
gcca Aug 6, 2018
9d2275e
[parquet-reader] Add column filter from file
gcca Aug 7, 2018
d0b265c
[parquet-reader] Add read to gdf column method
gcca Aug 7, 2018
3b464bd
[parquet-reader] Remove ReadGdfColumn method
gcca Aug 7, 2018
f92a931
decode bitpacking data using pinned memory
aocsa Aug 7, 2018
d25db66
Merge branch 'parquet-reader' of https://github.com/BlazingDB/libgdf …
aocsa Aug 7, 2018
1716e81
[parquet-reader] Add parquet target for linking
gcca Aug 8, 2018
9e39227
decode bitpacking data using pinned memory: merge
aocsa Aug 8, 2018
ab07b56
bitpacking decoding for all types
aocsa Aug 9, 2018
5ebc08c
start gpu benchmark for parquet reader
aocsa Aug 13, 2018
54a63a1
improve copy scheme from pinned memory to device memory
aocsa Aug 15, 2018
7ee8760
init benchmark for parquet reader
aocsa Aug 16, 2018
2ad9c25
wip: decode using only gpu
aocsa Aug 21, 2018
02c1132
gdf_column in device and benchmark for parquet reader
aocsa Aug 21, 2018
8be8e9e
implemented new expand function. Commented out problematic tests. sta…
Aug 21, 2018
273e17d
benckmark with huge parquet file
aocsa Aug 22, 2018
30c581a
added compact_to_sparse_for_nulls
Aug 23, 2018
c129c94
starting with kernel
Aug 23, 2018
298dc3d
starting with kernel
Aug 23, 2018
7f0f570
[parquet-reader]: ToGdfColumn using gpu using ReadBatch
aocsa Aug 23, 2018
7da1549
reimplemented compact_to_sparse_for_nulls
Aug 23, 2018
6979c33
added includes
Aug 23, 2018
fbae2c8
Merge branch 'willParquetExp' into willParquetKernelExp
Aug 24, 2018
bceb98b
fixed build errors but commented out usage of compact_to_sparse_for_n…
Aug 24, 2018
26a5ce5
Merge branch 'willParquetExp' into willParquetKernelExp
Aug 24, 2018
869d9eb
[parquet-reader] toGdfColumn valid support and expand using ReadBatch
aocsa Aug 24, 2018
55c53ae
kernel compiles
Aug 24, 2018
3c97bb2
improved kernel call
Aug 24, 2018
8f06c8f
improved kernel call
Aug 24, 2018
12f6404
[parquet-reader]: custom gpu kernel for definition levels to valid_bits
aocsa Aug 24, 2018
149f8d3
[parquet-reader] Add test for valid and nulls
gcca Aug 25, 2018
93a0235
[parquet-reader] Merged from branch
gcca Aug 25, 2018
d4f0be9
[parquet-reader] Test nulls with two row groups
gcca Aug 25, 2018
616b303
[parquet-reader] Update conversion to gdf column
gcca Aug 27, 2018
ce430a4
Merge branch 'parquet-reader' into willParquetKernelExp
Aug 27, 2018
67068eb
changed unpack_using_gpu to use new kernel. Changed metadata gatherin…
Aug 27, 2018
98940b8
[parquet-reader]: ReadBatchSpace support on gpu
aocsa Aug 27, 2018
f639c2b
[parquet-reader] Remove unexistent directory
gcca Aug 27, 2018
51f7479
[parquet-reader] check unit test and benchmark
aocsa Aug 28, 2018
4f88e80
changed bitpack remainders implementation
Aug 28, 2018
9f6adb7
[parquet-reader] Read filtering by row_groups and columns indices
gcca Aug 28, 2018
19628d5
Merge branch 'parquet-reader' of github.com:BlazingDB/libgdf into par…
gcca Aug 28, 2018
42bf16d
[parquet-reader] Merged from master
gcca Aug 29, 2018
e6810b5
[parquet-reader] Update to work with arrow 0.9
gcca Aug 29, 2018
81d8cb9
merged in bitpacking kernels
Aug 31, 2018
dbcf578
[parquet-reader] Fix broken ByIdsInOrder unit test
gcca Aug 31, 2018
6d2e4b3
[parquet-reader] update benchmark
aocsa Aug 31, 2018
6646f09
Merge branch 'parquet-reader' of https://github.com/BlazingDB/libgdf …
aocsa Aug 31, 2018
94ea6a4
[parquet-reader] Add read column method
gcca Aug 31, 2018
2950374
fixed an issue with parquet-benchmark test
Sep 5, 2018
fc0a72e
[parquet-reader]: fix parquet reader (tested with mortgage data)
aocsa Sep 7, 2018
fc85c2e
[parquet-reader] fix parquet benchmark
aocsa Sep 11, 2018
b6784de
[parquet-reader] rebase and fix types conversion
aocsa Sep 18, 2018
ea06079
[parquet-reader]: fix warnings
aocsa Sep 18, 2018
31326fa
[parquet-reader] Downgrade bison and flex
gcca Sep 18, 2018
55ab718
[parquet-reader] Add global ParquetCpp include directories
gcca Sep 18, 2018
c3f2552
[parquet-reader] Fix compiling warnings
gcca Sep 18, 2018
dc76e3d
[parquet-reader] fix bitpacking decoder and transform_valid
aocsa Sep 19, 2018
8bf8311
[parquet-reader]: merge with last fixes
aocsa Sep 19, 2018
951cbf9
[parquet-reader]: fix warnings
aocsa Sep 19, 2018
294f345
[parquet-reader]: fix warnings, type convertion
aocsa Sep 20, 2018
fe3def3
[parquet-reader] Merged from remote
gcca Oct 17, 2018
2e77073
[parquet-reader] Add API documentation
gcca Oct 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ python/libgdf_cffi/libgdf_cffi.py

## eclipse
.project

build2/
20 changes: 19 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#=============================================================================
# Copyright 2018 BlazingDB, Inc.
# Copyright 2018 Percy Camilo Triveño Aucahuasi <[email protected]>
# Copyright 2018 Cristhian Alberto Gonzales Castillo <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -25,7 +26,7 @@

PROJECT(libgdf)

cmake_minimum_required(VERSION 2.8) # not sure about version required
cmake_minimum_required(VERSION 3.3) # not sure about version required

set(CMAKE_CXX_STANDARD 11)
message(STATUS "Using C++ standard: c++${CMAKE_CXX_STANDARD}")
Expand All @@ -46,6 +47,7 @@ include(CTest)
# Include custom modules (see cmake directory)
include(ConfigureGoogleTest)
include(ConfigureArrow)
include(ConfigureParquetCpp)

find_package(CUDA)
set_package_properties(
Expand Down Expand Up @@ -83,12 +85,15 @@ else()
message(FATAL_ERROR "Apache Arrow not found, please check your settings.")
endif()

get_property(PARQUETCPP_INCLUDE_DIRS TARGET Apache::ParquetCpp PROPERTY INTERFACE_INCLUDE_DIRECTORIES)

include_directories(
"${CMAKE_CURRENT_SOURCE_DIR}/include"
"${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/cub"
"${CMAKE_CURRENT_SOURCE_DIR}/thirdparty/moderngpu/src"
"${CUDA_INCLUDE_DIRS}"
"${ARROW_INCLUDEDIR}"
"${PARQUETCPP_INCLUDE_DIRS}"
)

IF(CUDA_VERSION_MAJOR GREATER 7)
Expand Down Expand Up @@ -118,6 +123,19 @@ if(HT_LEGACY_ALLOCATOR)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-DHT_LEGACY_ALLOCATOR)
endif()

cuda_add_library(gdf-parquet
src/parquet/api.cpp
src/parquet/column_reader.cu
src/parquet/file_reader.cpp
src/parquet/file_reader_contents.cpp
src/parquet/page_reader.cpp
src/parquet/row_group_reader_contents.cpp
src/parquet/decoder/cu_level_decoder.cu
src/arrow/cu_decoder.cu
src/arrow/util/pinned_allocator.cu
)

target_link_libraries(gdf-parquet Apache::ParquetCpp)

cuda_add_library(gdf SHARED
src/binaryops.cu
Expand Down
3 changes: 2 additions & 1 deletion cmake/Modules/ConfigureArrow.cmake
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#=============================================================================
# Copyright 2018 BlazingDB, Inc.
# Copyright 2018 Percy Camilo Triveño Aucahuasi <[email protected]>
# Copyright 2018 Cristhian Alberto Gonzales Castillo <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -15,7 +16,7 @@
# limitations under the License.
#=============================================================================

set(ARROW_DOWNLOAD_BINARY_DIR ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/arrow-download/)
set(ARROW_DOWNLOAD_BINARY_DIR ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/arrow-download)

# Download and unpack arrow at configure time
configure_file(${CMAKE_SOURCE_DIR}/cmake/Templates/Arrow.CMakeLists.txt.cmake ${ARROW_DOWNLOAD_BINARY_DIR}/CMakeLists.txt COPYONLY)
Expand Down
89 changes: 89 additions & 0 deletions cmake/Modules/ConfigureParquetCpp.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#=============================================================================
# Copyright 2018 BlazingDB, Inc.
# Copyright 2018 Cristhian Alberto Gonzales Castillo <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#=============================================================================

# Download and unpack ParquetCpp at configure time
configure_file(${CMAKE_SOURCE_DIR}/cmake/Templates/ParquetCpp.CMakeLists.txt.cmake ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-download/CMakeLists.txt)

execute_process(
COMMAND ${CMAKE_COMMAND} -F "${CMAKE_GENERATOR}" .
RESULT_VARIABLE result
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-download/
)

if(result)
message(FATAL_ERROR "CMake step for ParquetCpp failed: ${result}")
endif()

# Transitive dependencies
set(ARROW_TRANSITIVE_DEPENDENCIES_PREFIX ${ARROW_DOWNLOAD_BINARY_DIR}/arrow-prefix/src/arrow-build)
set(BROTLI_TRANSITIVE_DEPENDENCY_PREFIX ${ARROW_TRANSITIVE_DEPENDENCIES_PREFIX}/brotli_ep/src/brotli_ep-install/lib/x86_64-linux-gnu)
set(BROTLI_STATIC_LIB_ENC ${BROTLI_TRANSITIVE_DEPENDENCY_PREFIX}/libbrotlienc.a)
set(BROTLI_STATIC_LIB_DEC ${BROTLI_TRANSITIVE_DEPENDENCY_PREFIX}/libbrotlidec.a)
set(BROTLI_STATIC_LIB_COMMON ${BROTLI_TRANSITIVE_DEPENDENCY_PREFIX}/libbrotlicommon.a)
set(SNAPPY_STATIC_LIB ${ARROW_TRANSITIVE_DEPENDENCIES_PREFIX}/snappy_ep/src/snappy_ep-install/lib/libsnappy.a)
set(ZLIB_STATIC_LIB ${ARROW_TRANSITIVE_DEPENDENCIES_PREFIX}/zlib_ep/src/zlib_ep-install/lib/libz.a)
set(LZ4_STATIC_LIB ${ARROW_TRANSITIVE_DEPENDENCIES_PREFIX}/lz4_ep-prefix/src/lz4_ep/lib/liblz4.a)
set(ZSTD_STATIC_LIB ${ARROW_TRANSITIVE_DEPENDENCIES_PREFIX}/zstd_ep-prefix/src/zstd_ep/lib/libzstd.a)
set(ARROW_HOME ${ARROW_ROOT})

set(ENV{BROTLI_STATIC_LIB_ENC} ${BROTLI_STATIC_LIB_ENC})
set(ENV{BROTLI_STATIC_LIB_DEC} ${BROTLI_STATIC_LIB_DEC})
set(ENV{BROTLI_STATIC_LIB_COMMON} ${BROTLI_STATIC_LIB_COMMON})
set(ENV{SNAPPY_STATIC_LIB} ${SNAPPY_STATIC_LIB})
set(ENV{ZLIB_STATIC_LIB} ${ZLIB_STATIC_LIB})
set(ENV{LZ4_STATIC_LIB} ${LZ4_STATIC_LIB})
set(ENV{ZSTD_STATIC_LIB} ${ZSTD_STATIC_LIB})
set(ENV{ARROW_HOME} ${ARROW_HOME})

execute_process(
COMMAND ${CMAKE_COMMAND} --build .
RESULT_VARIABLE result
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-download)

if(result)
message(FATAL_ERROR "Build step for ParquetCpp failed: ${result}")
endif()

# Add transitive dependency: Thrift
set(THRIFT_ROOT ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-build/thrift_ep/src/thrift_ep-install)

# Locate ParquetCpp package
set(PARQUETCPP_ROOT ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-install)
set(PARQUETCPP_BINARY_DIR ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-build)
set(PARQUETCPP_SOURCE_DIR ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-src)

# Dependency interfaces
find_package(Boost REQUIRED COMPONENTS regex)

add_library(Apache::Thrift INTERFACE IMPORTED)
set_target_properties(Apache::Thrift
PROPERTIES INTERFACE_INCLUDE_DIRECTORIES ${THRIFT_ROOT}/include)
set_target_properties(Apache::Thrift
PROPERTIES INTERFACE_LINK_LIBRARIES ${THRIFT_ROOT}/lib/libthrift.a)

add_library(Apache::Arrow INTERFACE IMPORTED)
set_target_properties(Apache::Arrow
PROPERTIES INTERFACE_INCLUDE_DIRECTORIES ${ARROW_ROOT}/include)
set_target_properties(Apache::Arrow
PROPERTIES INTERFACE_LINK_LIBRARIES "${ARROW_ROOT}/lib/libarrow.a;${BROTLI_STATIC_LIB_ENC};${BROTLI_STATIC_LIB_DEC};${BROTLI_STATIC_LIB_COMMON};${SNAPPY_STATIC_LIB};${ZLIB_STATIC_LIB};${LZ4_STATIC_LIB};${ZSTD_STATIC_LIB}")

add_library(Apache::ParquetCpp INTERFACE IMPORTED)
set_target_properties(Apache::ParquetCpp
PROPERTIES INTERFACE_INCLUDE_DIRECTORIES
"${PARQUETCPP_ROOT}/include;${PARQUETCPP_BINARY_DIR}/src;${PARQUETCPP_SOURCE_DIR}/src")
set_target_properties(Apache::ParquetCpp
PROPERTIES INTERFACE_LINK_LIBRARIES "${PARQUETCPP_ROOT}/lib/libparquet.a;Apache::Arrow;Apache::Thrift;Boost::regex")
14 changes: 5 additions & 9 deletions cmake/Templates/Arrow.CMakeLists.txt.cmake
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#=============================================================================
# Copyright 2018 BlazingDB, Inc.
# Copyright 2018 Percy Camilo Triveño Aucahuasi <[email protected]>
# Copyright 2018 Cristhian Alberto Gonzales Castillo <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -23,7 +24,7 @@ project(arrow-download NONE)

include(ExternalProject)

set(ARROW_VERSION "apache-arrow-0.10.0")
set(ARROW_VERSION "apache-arrow-0.9.0")

if (NOT "$ENV{PARQUET_ARROW_VERSION}" STREQUAL "")
set(ARROW_VERSION "$ENV{PARQUET_ARROW_VERSION}")
Expand All @@ -34,24 +35,19 @@ message(STATUS "Using Apache Arrow version: ${ARROW_VERSION}")
set(ARROW_URL "https://github.com/apache/arrow/archive/${ARROW_VERSION}.tar.gz")

set(ARROW_CMAKE_ARGS
#Arrow dependencies
-DARROW_WITH_LZ4=OFF
-DARROW_WITH_ZSTD=OFF
-DARROW_WITH_BROTLI=OFF
-DARROW_WITH_SNAPPY=OFF
-DARROW_WITH_ZLIB=OFF

#Build settings
-DARROW_BUILD_STATIC=ON
-DARROW_BUILD_SHARED=OFF
-DARROW_BOOST_USE_SHARED=ON
-DARROW_BUILD_TESTS=OFF
-DARROW_TEST_MEMCHECK=OFF
-DARROW_BUILD_BENCHMARKS=OFF
-DARROW_BUILD_UTILITIES=OFF
-DARROW_JEMALLOC=OFF

#Arrow modules
-DARROW_IPC=ON
-DARROW_COMPUTE=OFF
-DARROW_COMPUTE=ON
-DARROW_GPU=OFF
-DARROW_JEMALLOC=OFF
-DARROW_BOOST_VENDORED=OFF
Expand Down
44 changes: 44 additions & 0 deletions cmake/Templates/ParquetCpp.CMakeLists.txt.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#=============================================================================
# Copyright 2018 BlazingDB, Inc.
# Copyright 2018 Cristhian Alberto Gonzales Castillo <[email protected]>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#=============================================================================

cmake_minimum_required(VERSION 2.8.12)

project(parquetcpp-download NONE)

include(ExternalProject)

set(PARQUET_VERSION apache-parquet-cpp-1.4.0)

if (NOT $ENV{PARQUET_VERSION} STREQUAL "")
set(PARQUET_VERSION $ENV{PARQUET_VETSION})
endif()

message(STATUS "Using Apache ParquetCpp version: ${PARQUET_VERSION}")

ExternalProject_Add(parquetcpp
BINARY_DIR "${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-build"
CMAKE_ARGS
-DCMAKE_BUILD_TYPE=RELEASE
-DCMAKE_INSTALL_PREFIX=${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-install
-DPARQUET_ARROW_LINKAGE=static
-DPARQUET_BUILD_SHARED=OFF
-DPARQUET_BUILD_TESTS=OFF
GIT_REPOSITORY https://github.com/apache/parquet-cpp.git
GIT_TAG apache-parquet-cpp-1.4.0
INSTALL_DIR "${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-install"
SOURCE_DIR "${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/thirdparty/parquetcpp-src"
)
2 changes: 2 additions & 0 deletions conda_environments/dev_py35.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,6 @@ dependencies:
- llvmlite=0.18.0=py35_0
- numba=0.34.0.dev=np112py35_316
- cmake=3.6.3=0
- flex=2.6.0
- bison=3.0.4
- pyarrow=0.10.0
40 changes: 21 additions & 19 deletions include/gdf/cffi/types.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,21 @@ typedef enum {
*/
/* ----------------------------------------------------------------------------*/
typedef enum {
GDF_SUCCESS=0,
GDF_CUDA_ERROR, /**< Error occured in a CUDA call */
GDF_UNSUPPORTED_DTYPE, /**< The datatype of the gdf_column is unsupported */
GDF_COLUMN_SIZE_MISMATCH, /**< Two columns that should be the same size aren't the same size*/
GDF_COLUMN_SIZE_TOO_BIG, /**< Size of column is larger than the max supported size */
GDF_DATASET_EMPTY, /**< Input dataset is either null or has size 0 when it shouldn't */
GDF_VALIDITY_MISSING, /**< gdf_column's validity bitmask is null */
GDF_SUCCESS=0,
GDF_CUDA_ERROR, /**< Error occured in a CUDA call */
GDF_UNSUPPORTED_DTYPE, /**< The datatype of the gdf_column is unsupported */
GDF_COLUMN_SIZE_MISMATCH, /**< Two columns that should be the same size aren't the same size*/
GDF_COLUMN_SIZE_TOO_BIG, /**< Size of column is larger than the max supported size */
GDF_DATASET_EMPTY, /**< Input dataset is either null or has size 0 when it shouldn't */
GDF_VALIDITY_MISSING, /**< gdf_column's validity bitmask is null */
GDF_VALIDITY_UNSUPPORTED, /**< The requested gdf operation does not support validity bitmask handling, and one of the input columns has the valid bits enabled */
GDF_INVALID_API_CALL, /**< The arguments passed into the function were invalid */
GDF_JOIN_DTYPE_MISMATCH, /**< Datatype mismatch between corresponding columns in left/right tables in the Join function */
GDF_JOIN_TOO_MANY_COLUMNS, /**< Too many columns were passed in for the requested join operation*/
GDF_INVALID_API_CALL, /**< The arguments passed into the function were invalid */
GDF_JOIN_DTYPE_MISMATCH, /**< Datatype mismatch between corresponding columns in left/right tables in the Join function */
GDF_JOIN_TOO_MANY_COLUMNS, /**< Too many columns were passed in for the requested join operation*/
GDF_GROUPBY_TOO_MANY_COLUMNS, /**< Too many columns were passed in for the requested groupby operation */
GDF_IO_ERROR, /**< Failed reading or writing files */
GDF_DTYPE_MISMATCH, /**< Type mismatch between columns that should be the same type */
GDF_UNSUPPORTED_METHOD, /**< The method requested to perform an operation was invalid or unsupported (e.g., hash vs. sort)*/
GDF_UNSUPPORTED_METHOD, /**< The method requested to perform an operation was invalid or unsupported (e.g., hash vs. sort)*/
GDF_INVALID_AGGREGATOR, /**< Invalid aggregator was specified for a groupby*/
GDF_INVALID_HASH_FUNCTION, /**< Invalid hash function was selected */
GDF_PARTITION_DTYPE_MISMATCH, /**< Datatype mismatch between columns of input/output in the hash partition function */
Expand All @@ -58,7 +60,7 @@ typedef enum {
GDF_UNDEFINED_NVTX_COLOR, /**< The requested color used to define an NVTX range is not defined */
GDF_NULL_NVTX_NAME, /**< The requested name for an NVTX range cannot be nullptr */
GDF_C_ERROR, /**< C error not related to CUDA */
GDF_FILE_ERROR, /**< error processing sepcified file */
GDF_FILE_ERROR, /**< error processing sepcified file */
} gdf_error;

typedef enum {
Expand All @@ -80,7 +82,7 @@ typedef struct {
} gdf_dtype_extra_info;

typedef struct gdf_column_{
void *data; /**< Pointer to the columns data */
void *data; /**< Pointer to the columns data */
gdf_valid_type *valid; /**< Pointer to the columns validity bit mask where the 'i'th bit indicates if the 'i'th row is NULL */
gdf_size_type size; /**< Number of data elements in the columns data buffer*/
gdf_dtype dtype; /**< The datatype of the column's data */
Expand All @@ -90,7 +92,7 @@ typedef struct gdf_column_{
} gdf_column;

/* --------------------------------------------------------------------------*/
/**
/**
* @Synopsis These enums indicate which method is to be used for an operation.
* For example, it is used to select between the hash-based vs. sort-based implementations
* of the Join operation.
Expand All @@ -113,7 +115,7 @@ typedef enum {


/* --------------------------------------------------------------------------*/
/**
/**
* @Synopsis These enums indicate the supported aggregation operations that can be
* performed on a set of aggregation columns as part of a GroupBy operation
*/
Expand All @@ -130,15 +132,15 @@ typedef enum {


/* --------------------------------------------------------------------------*/
/**
/**
* @Synopsis Colors for use with NVTX ranges.
*
* These enumerations are the available pre-defined colors for use with
* user-defined NVTX ranges.
*/
/* ----------------------------------------------------------------------------*/
typedef enum {
GDF_GREEN = 0,
GDF_GREEN = 0,
GDF_BLUE,
GDF_YELLOW,
GDF_PURPLE,
Expand All @@ -151,8 +153,8 @@ typedef enum {
} gdf_color;

/* --------------------------------------------------------------------------*/
/**
* @Synopsis This struct holds various information about how an operation should be
/**
* @Synopsis This struct holds various information about how an operation should be
* performed as well as additional information about the input data.
*/
/* ----------------------------------------------------------------------------*/
Expand Down
Loading