Skip to content

Commit

Permalink
[Docs][Connector-V2][Hudi] Reconstruct the Hudi connector document (#…
Browse files Browse the repository at this point in the history
…4905)

* [Docs][Connector-V2][Hudi] Reconstruct the Hudi connector document


---------

Co-authored-by: zhouyao <[email protected]>
  • Loading branch information
Carl-Zhou-CN and zhouyao authored Jul 28, 2023
1 parent 0e4190a commit ce39948
Showing 1 changed file with 44 additions and 38 deletions.
82 changes: 44 additions & 38 deletions docs/en/connector-v2/source/Hudi.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,69 +2,67 @@

> Hudi source connector
## Description
## Support Those Engines

Used to read data from Hudi. Currently, only supports hudi cow table and Snapshot Query with Batch Mode.
> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.

## Key features
## Key Features

- [x] [batch](../../concept/connector-v2-features.md)

Currently, only supports hudi cow table and Snapshot Query with Batch Mode

- [ ] [stream](../../concept/connector-v2-features.md)
- [x] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [column projection](../../concept/connector-v2-features.md)
- [x] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|-------------------------|---------|------------------------------|---------------|
| table.path | string | yes | - |
| table.type | string | yes | - |
| conf.files | string | yes | - |
| use.kerberos | boolean | no | false |
| kerberos.principal | string | yes when use.kerberos = true | - |
| kerberos.principal.file | string | yes when use.kerberos = true | - |
| common-options | config | no | - |

### table.path [string]

`table.path` The hdfs root path of hudi table,such as 'hdfs://nameserivce/data/hudi/hudi_table/'.
## Description

### table.type [string]
Used to read data from Hudi. Currently, only supports hudi cow table and Snapshot Query with Batch Mode.

`table.type` The type of hudi table. Now we only support 'cow', 'mor' is not support yet.
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.

### conf.files [string]
## Supported DataSource Info

`conf.files` The environment conf file path list(local path), which used to init hdfs client to read hudi table file. The example is '/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml'.
:::tip

### use.kerberos [boolean]
* Currently, only supports Hudi cow table and Snapshot Query with Batch Mode

`use.kerberos` Whether to enable Kerberos, default is false.
:::

### kerberos.principal [string]
## Data Type Mapping

`kerberos.principal` When use kerberos, we should set kerberos princal such as 'test_user@xxx'.
| Hudi Data type | Seatunnel Data type |
|----------------|---------------------|
| ALL TYPE | STRING |

### kerberos.principal.file [string]
## Source Options

`kerberos.principal.file` When use kerberos, we should set kerberos princal file such as '/home/test/test_user.keytab'.
| Name | Type | Required | Default | Description |
|-------------------------|--------|------------------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| table.path | String | Yes | - | The hdfs root path of hudi table,such as 'hdfs://nameserivce/data/hudi/hudi_table/'. |
| table.type | String | Yes | - | The type of hudi table. Now we only support 'cow', 'mor' is not support yet. |
| conf.files | String | Yes | - | The environment conf file path list(local path), which used to init hdfs client to read hudi table file. The example is '/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml'. |
| use.kerberos | bool | No | false | Whether to enable Kerberos, default is false. |
| kerberos.principal | String | yes when use.kerberos = true | - | When use kerberos, we should set kerberos principal such as 'test_user@xxx'. |
| kerberos.principal.file | string | yes when use.kerberos = true | - | When use kerberos, we should set kerberos principal file such as '/home/test/test_user.keytab'. |
| common-options | config | No | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details. |

### common options
## Task Example

Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details.
### Simple:

## Examples
> This example reads from a Hudi COW table and configures Kerberos for the environment, printing to the console.
```hocon
source {
# Defining the runtime environment
env {
# You can set flink configuration here
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Hudi {
table.path = "hdfs://nameserivce/data/hudi/hudi_table/"
table.type = "cow"
Expand All @@ -73,7 +71,15 @@ source {
kerberos.principal = "test_user@xxx"
kerberos.principal.file = "/home/test/test_user.keytab"
}
}
transform {
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
# please go to https://seatunnel.apache.org/docs/transform-v2/sql/
}
sink {
Console {}
}
```

Expand Down

0 comments on commit ce39948

Please sign in to comment.