Skip to content
This repository has been archived by the owner on Dec 21, 2018. It is now read-only.

[Review] Parquet reader multithread #146

Open
wants to merge 109 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
6cb51df
[parquet-reader] Add parquet reader wrapper
gcca Jul 17, 2018
bbe9467
[parquet-reader] Add column reader
gcca Jul 18, 2018
6ced85b
[parquet-reader] Enable read new page call
gcca Jul 20, 2018
16b40cb
WIP: add custom decoder
aocsa Jul 20, 2018
fc57ccb
[parquet-reader] Update parquet API to v1.3.1
gcca Jul 23, 2018
3000f89
[parquet-reader] Read batch as gdf column
gcca Jul 25, 2018
a6e7d0e
arrow decoder
aocsa Jul 26, 2018
7c24364
merge with parquet-reader
aocsa Jul 26, 2018
3b9af0e
Merge branch 'parquet-reader' into parquet-decoder
aocsa Jul 26, 2018
4593968
[parquet-reader] Add gdf column read test
gcca Jul 26, 2018
abe73d3
[parquet-reader] Add file reader by columns benchmark
gcca Jul 27, 2018
a384b15
decoder using host
aocsa Jul 27, 2018
79470ea
decoder using gpu
aocsa Jul 27, 2018
3ef6ecd
[parquet-reader] Read spaced batches to gdf column
gcca Jul 30, 2018
4282650
Merge branch 'parquet-reader' into parquet-decoder
aocsa Aug 1, 2018
819af4e
use specific gpu-decoder for int32
aocsa Aug 1, 2018
5713017
[parquet-reader] Add API to read a parquet file
gcca Aug 2, 2018
7ad9972
[parquet-reader] Merge from parquet-decoder
gcca Aug 2, 2018
882a296
[parquet-reader] Fix template definitions for readers
gcca Aug 2, 2018
e8068eb
[parquet-reader] Merger from LibGDF/master
gcca Aug 2, 2018
e407912
[parquet-reader] Fix testing files
gcca Aug 2, 2018
9ba5d7e
[parquet-reader] Move tests to src
gcca Aug 2, 2018
6aaaa51
[parquet-reader] Fix access to parquetcpp repository
gcca Aug 2, 2018
13e27c7
[parquet-reader] Fix benchmark test building
gcca Aug 2, 2018
15ff796
[parquet-reader] Fix build moving tests into src
gcca Aug 2, 2018
d7bed6a
[parquet-reader] Update tests building process
gcca Aug 2, 2018
92d89e9
[parquet-reader] Add conda dependencies for Thrift
gcca Aug 3, 2018
f56a978
[parquet-reader] Check gdf dtype from parquet type
gcca Aug 6, 2018
9043c7a
[parquet-reader] Apply batch spaced reading on tests
gcca Aug 6, 2018
9d2275e
[parquet-reader] Add column filter from file
gcca Aug 7, 2018
d0b265c
[parquet-reader] Add read to gdf column method
gcca Aug 7, 2018
3b464bd
[parquet-reader] Remove ReadGdfColumn method
gcca Aug 7, 2018
f92a931
decode bitpacking data using pinned memory
aocsa Aug 7, 2018
d25db66
Merge branch 'parquet-reader' of https://github.com/BlazingDB/libgdf …
aocsa Aug 7, 2018
1716e81
[parquet-reader] Add parquet target for linking
gcca Aug 8, 2018
9e39227
decode bitpacking data using pinned memory: merge
aocsa Aug 8, 2018
ab07b56
bitpacking decoding for all types
aocsa Aug 9, 2018
5ebc08c
start gpu benchmark for parquet reader
aocsa Aug 13, 2018
54a63a1
improve copy scheme from pinned memory to device memory
aocsa Aug 15, 2018
7ee8760
init benchmark for parquet reader
aocsa Aug 16, 2018
2ad9c25
wip: decode using only gpu
aocsa Aug 21, 2018
02c1132
gdf_column in device and benchmark for parquet reader
aocsa Aug 21, 2018
8be8e9e
implemented new expand function. Commented out problematic tests. sta…
Aug 21, 2018
273e17d
benckmark with huge parquet file
aocsa Aug 22, 2018
30c581a
added compact_to_sparse_for_nulls
Aug 23, 2018
c129c94
starting with kernel
Aug 23, 2018
298dc3d
starting with kernel
Aug 23, 2018
7f0f570
[parquet-reader]: ToGdfColumn using gpu using ReadBatch
aocsa Aug 23, 2018
7da1549
reimplemented compact_to_sparse_for_nulls
Aug 23, 2018
6979c33
added includes
Aug 23, 2018
fbae2c8
Merge branch 'willParquetExp' into willParquetKernelExp
Aug 24, 2018
bceb98b
fixed build errors but commented out usage of compact_to_sparse_for_n…
Aug 24, 2018
26a5ce5
Merge branch 'willParquetExp' into willParquetKernelExp
Aug 24, 2018
869d9eb
[parquet-reader] toGdfColumn valid support and expand using ReadBatch
aocsa Aug 24, 2018
55c53ae
kernel compiles
Aug 24, 2018
3c97bb2
improved kernel call
Aug 24, 2018
8f06c8f
improved kernel call
Aug 24, 2018
12f6404
[parquet-reader]: custom gpu kernel for definition levels to valid_bits
aocsa Aug 24, 2018
149f8d3
[parquet-reader] Add test for valid and nulls
gcca Aug 25, 2018
93a0235
[parquet-reader] Merged from branch
gcca Aug 25, 2018
d4f0be9
[parquet-reader] Test nulls with two row groups
gcca Aug 25, 2018
616b303
[parquet-reader] Update conversion to gdf column
gcca Aug 27, 2018
ce430a4
Merge branch 'parquet-reader' into willParquetKernelExp
Aug 27, 2018
67068eb
changed unpack_using_gpu to use new kernel. Changed metadata gatherin…
Aug 27, 2018
98940b8
[parquet-reader]: ReadBatchSpace support on gpu
aocsa Aug 27, 2018
f639c2b
[parquet-reader] Remove unexistent directory
gcca Aug 27, 2018
51f7479
[parquet-reader] check unit test and benchmark
aocsa Aug 28, 2018
4f88e80
changed bitpack remainders implementation
Aug 28, 2018
9f6adb7
[parquet-reader] Read filtering by row_groups and columns indices
gcca Aug 28, 2018
19628d5
Merge branch 'parquet-reader' of github.com:BlazingDB/libgdf into par…
gcca Aug 28, 2018
42bf16d
[parquet-reader] Merged from master
gcca Aug 29, 2018
e6810b5
[parquet-reader] Update to work with arrow 0.9
gcca Aug 29, 2018
81d8cb9
merged in bitpacking kernels
Aug 31, 2018
dbcf578
[parquet-reader] Fix broken ByIdsInOrder unit test
gcca Aug 31, 2018
6d2e4b3
[parquet-reader] update benchmark
aocsa Aug 31, 2018
6646f09
Merge branch 'parquet-reader' of https://github.com/BlazingDB/libgdf …
aocsa Aug 31, 2018
94ea6a4
[parquet-reader] Add read column method
gcca Aug 31, 2018
2950374
fixed an issue with parquet-benchmark test
Sep 5, 2018
fc0a72e
[parquet-reader]: fix parquet reader (tested with mortgage data)
aocsa Sep 7, 2018
73703b0
implemented solution, need to change it to read valids separatelly an…
Sep 7, 2018
a905116
wip
Sep 7, 2018
d7740ca
Merge branch 'parquet-reader' into parquet-reader-multithread
Sep 7, 2018
74f741a
created seams for bitmaks, need to apply them back into device valid
Sep 7, 2018
fc85c2e
[parquet-reader] fix parquet benchmark
aocsa Sep 11, 2018
b6784de
[parquet-reader] rebase and fix types conversion
aocsa Sep 18, 2018
4eae308
modified unit test. Troubleshooting bugs
Sep 18, 2018
0f9cbf6
created single threaded version for debugging
Sep 18, 2018
849c866
Merge branch 'parquet-reader' into parquet-reader-multithread
Sep 18, 2018
e3d270e
fixed build errors, and issues with tests. Still getting errors with …
Sep 18, 2018
ea06079
[parquet-reader]: fix warnings
aocsa Sep 18, 2018
31326fa
[parquet-reader] Downgrade bison and flex
gcca Sep 18, 2018
55ab718
[parquet-reader] Add global ParquetCpp include directories
gcca Sep 18, 2018
c3f2552
[parquet-reader] Fix compiling warnings
gcca Sep 18, 2018
07e6e85
fixed bug in guard in bitpacking kernel
Sep 19, 2018
dc76e3d
[parquet-reader] fix bitpacking decoder and transform_valid
aocsa Sep 19, 2018
8bf8311
[parquet-reader]: merge with last fixes
aocsa Sep 19, 2018
951cbf9
[parquet-reader]: fix warnings
aocsa Sep 19, 2018
ab57c53
cleaned up code. Using _ReadFileMultiThread where it needs to. All te…
Sep 19, 2018
5002683
made small change to unit test and found more issues
Sep 19, 2018
a7ce67a
fixed bug in allocator function
Sep 19, 2018
9cd6e16
[parquet-reader-multithread] fix warnings
aocsa Sep 20, 2018
52b03f7
[parquet-reader-multithread] remove dead code and add comments
aocsa Sep 21, 2018
efcffd4
added new parquet-multithread-benchmark test. Fixed parquet-reader ap…
Sep 24, 2018
dd9a65f
fixed benchmark unit test
Sep 25, 2018
b342fe4
moved parquet benchmarks to bench folder
Sep 26, 2018
95b16e3
Merge branch 'master' into parquet-reader-multithread
Sep 27, 2018
d1e8ff7
added a new public API which takes in an file reading interface. Adde…
Oct 1, 2018
ec54c9a
Merge branch 'master' into parquet-reader-multithread
Oct 2, 2018
b7c2686
fixed interface implementation to be RandomAccessFile which is an int…
Oct 11, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion include/gdf/parquet/api.h
Original file line number Diff line number Diff line change
@@ -34,6 +34,6 @@ BEGIN_NAMESPACE_GDF_PARQUET
extern "C" gdf_error
read_parquet_file(const char *const filename,
gdf_column **const out_gdf_columns,
std::size_t *const out_gdf_columns_length);
size_t *const out_gdf_columns_length);

END_NAMESPACE_GDF_PARQUET
149 changes: 111 additions & 38 deletions src/parquet/api.cpp
Original file line number Diff line number Diff line change
@@ -33,24 +33,24 @@ BEGIN_NAMESPACE_GDF_PARQUET
namespace {

template <::parquet::Type::type TYPE>
struct parquet_traits {};
struct parquet_physical_traits {};

#define PARQUET_TRAITS_FACTORY(TYPE, DTYPE) \
#define PARQUET_PHYSICAL_TRAITS_FACTORY(TYPE, DTYPE) \
template <> \
struct parquet_traits<::parquet::Type::TYPE> { \
struct parquet_physical_traits<::parquet::Type::TYPE> { \
static constexpr gdf_dtype dtype = GDF_##DTYPE; \
}

PARQUET_TRAITS_FACTORY(BOOLEAN, INT8);
PARQUET_TRAITS_FACTORY(INT32, INT32);
PARQUET_TRAITS_FACTORY(INT64, INT64);
PARQUET_TRAITS_FACTORY(INT96, invalid);
PARQUET_TRAITS_FACTORY(FLOAT, FLOAT32);
PARQUET_TRAITS_FACTORY(DOUBLE, FLOAT64);
PARQUET_TRAITS_FACTORY(BYTE_ARRAY, invalid);
PARQUET_TRAITS_FACTORY(FIXED_LEN_BYTE_ARRAY, invalid);
PARQUET_PHYSICAL_TRAITS_FACTORY(BOOLEAN, INT8);
PARQUET_PHYSICAL_TRAITS_FACTORY(INT32, INT32);
PARQUET_PHYSICAL_TRAITS_FACTORY(INT64, INT64);
PARQUET_PHYSICAL_TRAITS_FACTORY(INT96, invalid);
PARQUET_PHYSICAL_TRAITS_FACTORY(FLOAT, FLOAT32);
PARQUET_PHYSICAL_TRAITS_FACTORY(DOUBLE, FLOAT64);
PARQUET_PHYSICAL_TRAITS_FACTORY(BYTE_ARRAY, invalid);
PARQUET_PHYSICAL_TRAITS_FACTORY(FIXED_LEN_BYTE_ARRAY, invalid);

#undef PARQUET_TRAITS_FACTORY
#undef PARQUET_PHYSICAL_TRAITS_FACTORY

template <::parquet::Type::type TYPE>
static inline std::size_t
@@ -81,15 +81,16 @@ _ReadBatch(const std::shared_ptr<::parquet::ColumnReader> &column_reader,
std::size_t batch_size = 8;
std::size_t total_read = 0;
do {
batch = reader->ReadBatchSpaced(batch_size,
definition_levels,
repetition_levels,
values + batch_actual,
valid_bits,
0,
&levels_read,
&values_read,
&nulls_count);
batch = reader->ReadBatchSpaced(
batch_size,
definition_levels,
repetition_levels,
values + batch_actual,
valid_bits + static_cast<std::ptrdiff_t>(batch_actual / 8),
0,
&levels_read,
&values_read,
&nulls_count);
total_read += static_cast<std::size_t>(values_read);
batch_actual += batch;
batch_size = std::max(batch_size * 2, min_batch_size);
@@ -99,14 +100,80 @@ _ReadBatch(const std::shared_ptr<::parquet::ColumnReader> &column_reader,
return total_read;
}

struct ParquetTypeHash {
template <class T>
std::size_t
operator()(T t) const {
return static_cast<std::size_t>(t);
}
};

const std::unordered_map<::parquet::Type::type, gdf_dtype, ParquetTypeHash>
dtype_from_physical_type_map{
{::parquet::Type::BOOLEAN, GDF_INT8},
{::parquet::Type::INT32, GDF_INT32},
{::parquet::Type::INT64, GDF_INT64},
{::parquet::Type::INT96, GDF_invalid},
{::parquet::Type::FLOAT, GDF_FLOAT32},
{::parquet::Type::DOUBLE, GDF_FLOAT64},
{::parquet::Type::BYTE_ARRAY, GDF_invalid},
{::parquet::Type::FIXED_LEN_BYTE_ARRAY, GDF_invalid},
};

const std::
unordered_map<::parquet::LogicalType::type, gdf_dtype, ParquetTypeHash>
dtype_from_logical_type_map{
{::parquet::LogicalType::NONE, GDF_invalid},
{::parquet::LogicalType::UTF8, GDF_invalid},
{::parquet::LogicalType::MAP, GDF_invalid},
{::parquet::LogicalType::MAP_KEY_VALUE, GDF_invalid},
{::parquet::LogicalType::LIST, GDF_invalid},
{::parquet::LogicalType::ENUM, GDF_invalid},
{::parquet::LogicalType::DECIMAL, GDF_invalid},
{::parquet::LogicalType::DATE, GDF_DATE32},
{::parquet::LogicalType::TIME_MILLIS, GDF_invalid},
{::parquet::LogicalType::TIME_MICROS, GDF_invalid},
{::parquet::LogicalType::TIMESTAMP_MILLIS, GDF_TIMESTAMP},
{::parquet::LogicalType::TIMESTAMP_MICROS, GDF_invalid},
{::parquet::LogicalType::UINT_8, GDF_invalid},
{::parquet::LogicalType::UINT_16, GDF_invalid},
{::parquet::LogicalType::UINT_32, GDF_invalid},
{::parquet::LogicalType::UINT_64, GDF_invalid},
{::parquet::LogicalType::INT_8, GDF_INT8},
{::parquet::LogicalType::INT_16, GDF_INT16},
{::parquet::LogicalType::INT_32, GDF_INT32},
{::parquet::LogicalType::INT_64, GDF_INT64},
{::parquet::LogicalType::JSON, GDF_invalid},
{::parquet::LogicalType::BSON, GDF_invalid},
{::parquet::LogicalType::INTERVAL, GDF_invalid},
{::parquet::LogicalType::NA, GDF_invalid},
};

static inline gdf_dtype
_DTypeFrom(const ::parquet::ColumnDescriptor *const column_descriptor) {
const ::parquet::LogicalType::type logical_type =
column_descriptor->logical_type();

if (logical_type != ::parquet::LogicalType::NONE) {
return dtype_from_logical_type_map.at(logical_type);
}

const ::parquet::Type::type physical_type =
column_descriptor->physical_type();

return dtype_from_physical_type_map.at(physical_type);
}

template <::parquet::Type::type TYPE>
static inline gdf_error
_AllocateGdfColumn(const std::size_t num_rows, gdf_column *const _gdf_column) {
_AllocateGdfColumn(const std::size_t num_rows,
const ::parquet::ColumnDescriptor *const column_descriptor,
gdf_column & _gdf_column) {
const std::size_t value_byte_size =
static_cast<std::size_t>(::parquet::type_traits<TYPE>::value_byte_size);

try {
_gdf_column->data =
_gdf_column.data =
static_cast<void *>(new std::uint8_t[num_rows * value_byte_size]);
} catch (const std::bad_alloc &e) {
#ifdef GDF_DEBUG
@@ -116,7 +183,7 @@ _AllocateGdfColumn(const std::size_t num_rows, gdf_column *const _gdf_column) {
}

try {
_gdf_column->valid = static_cast<gdf_valid_type *>(
_gdf_column.valid = static_cast<gdf_valid_type *>(
new std::uint8_t[arrow::BitUtil::BytesForBits(num_rows)]);
} catch (const std::bad_alloc &e) {
#ifdef GDF_DEBUG
@@ -125,26 +192,30 @@ _AllocateGdfColumn(const std::size_t num_rows, gdf_column *const _gdf_column) {
return GDF_BAD_ALLOC;
}

_gdf_column->size = num_rows;
_gdf_column->dtype = parquet_traits<TYPE>::dtype;
_gdf_column.size = num_rows;
_gdf_column.dtype = _DTypeFrom(column_descriptor);

return GDF_SUCCESS;
}

static inline gdf_error
_AllocateGdfColumns(const std::size_t num_columns,
const std::size_t num_rows,
const std::vector<::parquet::Type::type> type_nums,
gdf_column *const gdf_columns) {
_AllocateGdfColumns(
const std::size_t num_columns,
const std::size_t num_rows,
const std::vector<const ::parquet::ColumnDescriptor *> &column_descriptors,
gdf_column *const gdf_columns) {
#define WHEN(TYPE) \
case ::parquet::Type::TYPE: \
_AllocateGdfColumn<::parquet::Type::TYPE>(num_rows, _gdf_column); \
_AllocateGdfColumn<::parquet::Type::TYPE>( \
num_rows, column_descriptor, _gdf_column); \
break

for (std::size_t i = 0; i < num_columns; i++) {
gdf_column *const _gdf_column = &gdf_columns[i];
gdf_column & _gdf_column = gdf_columns[i];
const ::parquet::ColumnDescriptor *const column_descriptor =
column_descriptors[i];

switch (type_nums[i]) {
switch (column_descriptor->physical_type()) {
WHEN(BOOLEAN);
WHEN(INT32);
WHEN(INT64);
@@ -190,7 +261,7 @@ read_parquet_file(const char *const filename,
const std::unique_ptr<FileReader> file_reader =
FileReader::OpenFile(filename);

const std::shared_ptr<const ::parquet::FileMetaData> metadata =
const std::shared_ptr<const ::parquet::FileMetaData> &metadata =
file_reader->metadata();

const std::size_t num_row_groups =
@@ -210,12 +281,14 @@ read_parquet_file(const char *const filename,

if (gdf_columns == nullptr) { return GDF_BAD_ALLOC; }

std::vector<::parquet::Type::type> type_nums;
type_nums.reserve(num_columns);
std::vector<const ::parquet::ColumnDescriptor *> column_descriptors;
column_descriptors.reserve(num_columns);
for (std::size_t i = 0; i < num_columns; i++) {
type_nums.emplace_back(file_reader->RowGroup(0)->Column(i)->type());
column_descriptors.emplace_back(
file_reader->RowGroup(0)->Column(i)->descr());
}
if (_AllocateGdfColumns(num_columns, num_rows, type_nums, gdf_columns)
if (_AllocateGdfColumns(
num_columns, num_rows, column_descriptors, gdf_columns)
!= GDF_SUCCESS) {
return GDF_BAD_ALLOC;
}
46 changes: 0 additions & 46 deletions src/parquet/column_reader.cpp
Original file line number Diff line number Diff line change
@@ -261,52 +261,6 @@ _ReadValuesSpaced(DecoderType * decoder,
valid_bits_offset);
}

template <class DataType>
inline std::int64_t
ColumnReader<DataType>::ReadBatch(std::int64_t batch_size,
std::int16_t *def_levels,
std::int16_t *rep_levels,
T * values,
std::int64_t *values_read) {
if (!HasNext()) {
*values_read = 0;
return 0;
}

batch_size =
std::min(batch_size, num_buffered_values_ - num_decoded_values_);

std::int64_t num_def_levels = 0;
std::int64_t num_rep_levels = 0;

std::int64_t values_to_read = 0;

if (descr_->max_definition_level() > 0 && def_levels) {
num_def_levels = ReadDefinitionLevels(batch_size, def_levels);
for (std::int64_t i = 0; i < num_def_levels; ++i) {
if (def_levels[i] == descr_->max_definition_level()) {
++values_to_read;
}
}
} else {
values_to_read = batch_size;
}

if (descr_->max_repetition_level() > 0 && rep_levels) {
num_rep_levels = ReadRepetitionLevels(batch_size, rep_levels);
if (def_levels && num_def_levels != num_rep_levels) {
throw ::parquet::ParquetException(
"Number of decoded rep / def levels did not match");
}
}

*values_read = _ReadValues(current_decoder_, values_to_read, values);
std::int64_t total_values = std::max(num_def_levels, *values_read);
ConsumeBufferedValues(total_values);

return total_values;
}

template <typename DataType>
inline std::int64_t
ColumnReader<DataType>::ReadBatchSpaced(std::int64_t batch_size,
6 changes: 0 additions & 6 deletions src/parquet/column_reader.h
Original file line number Diff line number Diff line change
@@ -32,12 +32,6 @@ class ColumnReader : public ::parquet::ColumnReader {

bool HasNext();

std::int64_t ReadBatch(std::int64_t batchSize,
std::int16_t *definitionLevels,
std::int16_t *repetitionLevels,
T * values,
std::int64_t *valuesRead);

std::int64_t ReadBatchSpaced(std::int64_t batch_size,
std::int16_t *def_levels,
std::int16_t *rep_levels,
13 changes: 6 additions & 7 deletions src/tests/parquet/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -36,15 +36,14 @@ file(MAKE_DIRECTORY ${BENCHMARK_ROOT}/lib)

add_library(Google::Benchmark INTERFACE IMPORTED)
add_dependencies(Google::Benchmark benchmark_ep)
target_include_directories(Google::Benchmark INTERFACE
${BENCHMARK_ROOT}/include)
target_link_libraries(Google::Benchmark INTERFACE
${BENCHMARK_ROOT}/lib/libbenchmark.a)
set_target_properties(Google::Benchmark
PROPERTIES INTERFACE_INCLUDE_DIRECTORIES ${BENCHMARK_ROOT}/include)
set_target_properties(Google::Benchmark
PROPERTIES INTERFACE_LINK_LIBRARIES ${BENCHMARK_ROOT}/lib/libbenchmark.a)

add_library(Google::Benchmark::Main INTERFACE IMPORTED)
target_link_libraries(Google::Benchmark::Main INTERFACE
Google::Benchmark
${BENCHMARK_ROOT}/lib/libbenchmark_main.a)
set_target_properties(Google::Benchmark::Main
PROPERTIES INTERFACE_LINK_LIBRARIES ${BENCHMARK_ROOT}/lib/libbenchmark_main.a)
endif()

set(file_reader_SRCS
19 changes: 13 additions & 6 deletions src/tests/parquet/decoding/decoding-test.cpp
Original file line number Diff line number Diff line change
@@ -45,7 +45,10 @@ checkRowGroups(const std::unique_ptr<gdf::parquet::FileReader> &reader) {
const std::shared_ptr<parquet::RowGroupReader> row_group =
reader->RowGroup(r);

std::int64_t values_read = 0;
std::int64_t levels_read;
std::int64_t values_read = 0;
std::int64_t nulls_count;

int i;
std::shared_ptr<parquet::ColumnReader> column;

@@ -67,11 +70,15 @@ checkRowGroups(const std::unique_ptr<gdf::parquet::FileReader> &reader) {
int64_t rows_read_total = 0;
while (rows_read_total < amountToRead) {
int64_t rows_read =
int32_reader->ReadBatch(amountToRead,
dresult.data(),
rresult.data(),
(int32_t *) (&(valuesBuffer[rows_read_total])),
&values_read
int32_reader->ReadBatchSpaced(amountToRead,
dresult.data(),
rresult.data(),
(int32_t *) (&(valuesBuffer[rows_read_total])),
valid_bits.data(),
0,
&levels_read,
&values_read,
&nulls_count
);
std::cout << "rows_read: " << rows_read << std::endl;
rows_read_total += rows_read;
1 change: 1 addition & 0 deletions src/tests/parquet/file_reader/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -21,6 +21,7 @@ set(PARQUET_FILE_PATH
${CMAKE_SOURCE_DIR}/src/tests/parquet/file_reader/reader-test.parquet)

GDF_ADD_PARQUET_TEST(file_reader-test
file_reader-test.cpp
single_column_file-test.cpp
api-test.cpp)

Loading