Skip to content

[BitSail][Connector] Optimize split strategy in Kudu Reader #143

Open
@BlockLiu

Description

@BlockLiu

Is your feature request related to a problem? Please describe

Current Kudu reader only supports one split strategy, i.e. SIMPLE_DIVIDE which simply evenly divide a integer range into several sub-ranges as splits.
Document reference: connector-kudu

This split strategy has shortages:

  1. User has to determine a integer type column (int8, int16, int32, int64) as split dimension.
  2. If user does not know the lower and upper bound, it will scan the whole table to get the actual lower and upper bound.
  3. It does not support null value in the dimension.

So this issue wants someone(s) to optimize current split strategy or implement other split strategies.

Describe the solution you'd like

  1. Similar to KuduTableInputFormat in kudu-mapreduce, may be we can let user directly set serialized KuduPredicates in configuration files.
  2. KuduTable supports List<Partition> getRangePartitions(long timeout) method. This method can get all range partitions in the table. Maybe one can directly use these partitioned ranges as splits.

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

difficulty-easyEasy difficulty to fix this issuehelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions