Skip to content

[GLUTEN-9335][VL] Support iceberg write unpartitioned table #9397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jul 23, 2025

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Apr 22, 2025

Base on this PR facebookincubator/velox#10996, which is merged to ibm/velox, and lacks for the metadata, so the read performance is not performed as expected. Use the flag --enable_enhanced_features to enable this feature, default disable.
Use org.apache.gluten.tags.EnhancedFeaturesTest test Tag on the specified enhanced features tests to exclude, exclude the tests default by profile exclude-tests, we cannot use the jni call to decide if run the tests because the library is not loaded when listing the tests.

Only supports Spark34, spark35 iceberg version 1.5.0 is not supported.

Supports parquet format because avro and orc write is not supported in Velox

Fallback the complex data type write because the metric does not support.

@jinchengchenghh jinchengchenghh changed the title [Gluten-9335][VL] Support iceberg write [GLUTEN-9335][VL] Support iceberg write Apr 22, 2025
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions github-actions bot added CORE works for Gluten Core VELOX DATA_LAKE labels Apr 22, 2025
Copy link

#9335

Copy link

Run Gluten Clickhouse CI on x86

2 similar comments
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the BUILD label Apr 23, 2025
@jinchengchenghh jinchengchenghh requested a review from Copilot April 23, 2025 12:58
@jinchengchenghh jinchengchenghh marked this pull request as draft April 23, 2025 12:58
Copy link

Run Gluten Clickhouse CI on x86

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds initial support for Iceberg write functionality in Velox by introducing new Java classes, C++ writer implementations, and JNI wrappers. The key changes include:

  • New Java code for Iceberg write (DataFileJson, ColumnarDataWriterFactory, ColumnarBatchWrite).
  • C++ implementation of the Iceberg writer and integration with the Velox runtime and backend.
  • New JNI methods for initializing, writing, and committing Iceberg write operations.

Reviewed Changes

Copilot reviewed 26 out of 33 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
gluten-iceberg/src-iceberg/main/java/org/apache/gluten/connector/write/DataFileJson.java New data file JSON model for Iceberg write.
gluten-iceberg/src-iceberg/main/java/org/apache/gluten/connector/write/ColumnarDataWriterFactory.java New interface for columnar data writer factory.
gluten-iceberg/src-iceberg/main/java/org/apache/gluten/connector/write/ColumnarBatchWrite.java New abstract batch write implementation throwing unsupported operation.
cpp/velox/utils/ConfigExtractor.cc Updated hive configuration extraction with additional keys.
cpp/velox/tests/iceberg/IcebergWriteTest.cc New test for validating Iceberg write functionality.
cpp/velox/jni/VeloxJniWrapper.cc Added JNI wrappers for Iceberg writer operations.
cpp/velox/compute/iceberg/IcebergWriter.h Header for the new Iceberg writer.
cpp/velox/compute/iceberg/IcebergWriter.cc Implementation of the Iceberg writer.
cpp/velox/compute/iceberg/IcebergFormat.h & IcebergFormat.cc New enums and helper function for file format conversion.
cpp/velox/compute/VeloxRuntime.h & VeloxRuntime.cc Integration of the Iceberg writer into the Velox runtime.
cpp/velox/compute/VeloxBackend.cc Refinement of hive configuration setup in the backend.
Files not reviewed (7)
  • backends-velox/src-iceberg/main/scala/org/apache/gluten/component/VeloxIcebergComponent.scala: Language not supported
  • backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxBackend.scala: Language not supported
  • cpp/CMakeLists.txt: Language not supported
  • cpp/velox/CMakeLists.txt: Language not supported
  • cpp/velox/tests/CMakeLists.txt: Language not supported
  • ep/build-velox/src/get_velox.sh: Language not supported
  • gluten-iceberg/pom.xml: Language not supported
Comments suppressed due to low confidence (1)

cpp/velox/utils/ConfigExtractor.cc:227

  • Duplicate assignment to hiveConfMap for key kEnableFileHandleCache detected; please remove the redundant assignment to avoid potential confusion.
  hiveConfMap[facebook::velox::connector::hive::HiveConfig::kEnableFileHandleCache] = conf->get<bool>(kVeloxFileHandleCacheEnabled, kVeloxFileHandleCacheEnabledDefault) ? "true" : "false";

case IcebergFileFormat::PARQUET:
return FileFormat::PARQUET;
default:
throw std::invalid_argument("Not suppport file format " + std::to_string(format));
Copy link
Preview

Copilot AI Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message contains a typographical error ('suppport'). Please correct it to 'support'.

Suggested change
throw std::invalid_argument("Not suppport file format " + std::to_string(format));
throw std::invalid_argument("Not support file format " + std::to_string(format));

Copilot uses AI. Check for mistakes.

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot removed the BUILD label Apr 24, 2025
Copy link

Run Gluten Clickhouse CI on x86

@jinchengchenghh jinchengchenghh marked this pull request as ready for review April 24, 2025 12:59
Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the DOCS label Jul 21, 2025
Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@jinchengchenghh jinchengchenghh merged commit b5c9bd1 into apache:main Jul 23, 2025
94 of 95 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants