Open
Description
Is your feature request related to a problem? Please describe
Current Kudu reader only supports one split strategy, i.e. SIMPLE_DIVIDE which simply evenly divide a integer range into several sub-ranges as splits.
Document reference: connector-kudu
This split strategy has shortages:
- User has to determine a integer type column (int8, int16, int32, int64) as split dimension.
- If user does not know the lower and upper bound, it will scan the whole table to get the actual lower and upper bound.
- It does not support null value in the dimension.
So this issue wants someone(s) to optimize current split strategy or implement other split strategies.
Describe the solution you'd like
- Similar to KuduTableInputFormat in
kudu-mapreduce
, may be we can let user directly set serialized KuduPredicates in configuration files. - KuduTable supports
List<Partition> getRangePartitions(long timeout)
method. This method can get all range partitions in the table. Maybe one can directly use these partitioned ranges as splits.