-
Notifications
You must be signed in to change notification settings - Fork 561
[GLUTEN-9335][VL] Support iceberg write unpartitioned table #9397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI on x86 |
2 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
3a32eb8
to
4d7b79f
Compare
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds initial support for Iceberg write functionality in Velox by introducing new Java classes, C++ writer implementations, and JNI wrappers. The key changes include:
- New Java code for Iceberg write (DataFileJson, ColumnarDataWriterFactory, ColumnarBatchWrite).
- C++ implementation of the Iceberg writer and integration with the Velox runtime and backend.
- New JNI methods for initializing, writing, and committing Iceberg write operations.
Reviewed Changes
Copilot reviewed 26 out of 33 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
gluten-iceberg/src-iceberg/main/java/org/apache/gluten/connector/write/DataFileJson.java | New data file JSON model for Iceberg write. |
gluten-iceberg/src-iceberg/main/java/org/apache/gluten/connector/write/ColumnarDataWriterFactory.java | New interface for columnar data writer factory. |
gluten-iceberg/src-iceberg/main/java/org/apache/gluten/connector/write/ColumnarBatchWrite.java | New abstract batch write implementation throwing unsupported operation. |
cpp/velox/utils/ConfigExtractor.cc | Updated hive configuration extraction with additional keys. |
cpp/velox/tests/iceberg/IcebergWriteTest.cc | New test for validating Iceberg write functionality. |
cpp/velox/jni/VeloxJniWrapper.cc | Added JNI wrappers for Iceberg writer operations. |
cpp/velox/compute/iceberg/IcebergWriter.h | Header for the new Iceberg writer. |
cpp/velox/compute/iceberg/IcebergWriter.cc | Implementation of the Iceberg writer. |
cpp/velox/compute/iceberg/IcebergFormat.h & IcebergFormat.cc | New enums and helper function for file format conversion. |
cpp/velox/compute/VeloxRuntime.h & VeloxRuntime.cc | Integration of the Iceberg writer into the Velox runtime. |
cpp/velox/compute/VeloxBackend.cc | Refinement of hive configuration setup in the backend. |
Files not reviewed (7)
- backends-velox/src-iceberg/main/scala/org/apache/gluten/component/VeloxIcebergComponent.scala: Language not supported
- backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxBackend.scala: Language not supported
- cpp/CMakeLists.txt: Language not supported
- cpp/velox/CMakeLists.txt: Language not supported
- cpp/velox/tests/CMakeLists.txt: Language not supported
- ep/build-velox/src/get_velox.sh: Language not supported
- gluten-iceberg/pom.xml: Language not supported
Comments suppressed due to low confidence (1)
cpp/velox/utils/ConfigExtractor.cc:227
- Duplicate assignment to hiveConfMap for key kEnableFileHandleCache detected; please remove the redundant assignment to avoid potential confusion.
hiveConfMap[facebook::velox::connector::hive::HiveConfig::kEnableFileHandleCache] = conf->get<bool>(kVeloxFileHandleCacheEnabled, kVeloxFileHandleCacheEnabledDefault) ? "true" : "false";
case IcebergFileFormat::PARQUET: | ||
return FileFormat::PARQUET; | ||
default: | ||
throw std::invalid_argument("Not suppport file format " + std::to_string(format)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message contains a typographical error ('suppport'). Please correct it to 'support'.
throw std::invalid_argument("Not suppport file format " + std::to_string(format)); | |
throw std::invalid_argument("Not support file format " + std::to_string(format)); |
Copilot uses AI. Check for mistakes.
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
588c3c6
to
2197f64
Compare
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
f9e9864
to
654a6ff
Compare
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
614032c
to
018deb9
Compare
469f822
to
0d57c55
Compare
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Base on this PR facebookincubator/velox#10996, which is merged to ibm/velox, and lacks for the metadata, so the read performance is not performed as expected. Use the flag
--enable_enhanced_features
to enable this feature, default disable.Use
org.apache.gluten.tags.EnhancedFeaturesTest
test Tag on the specified enhanced features tests to exclude, exclude the tests default by profileexclude-tests
, we cannot use the jni call to decide if run the tests because the library is not loaded when listing the tests.Only supports Spark34, spark35 iceberg version 1.5.0 is not supported.
Supports parquet format because avro and orc write is not supported in Velox
Fallback the complex data type write because the metric does not support.