Skip to content

Conversation

@zhuqi-lucas
Copy link
Collaborator

Array json support

apache#19924

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for reading JSON files in array format [{...}, {...}] in addition to the existing line-delimited (NDJSON) format. The implementation adds a new format_array boolean option to JsonOptions, along with a compression_level field for future compression support.

Changes:

  • Added format_array and compression_level fields to JsonOptions protobuf and configuration structures
  • Implemented JSON array format parsing in datasource-json module with proper schema inference
  • Added validation to prevent incompatible range-based scanning with array format
  • Comprehensive test coverage including unit tests and sqllogictest

Reviewed changes

Copilot reviewed 14 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
datafusion/proto-common/proto/datafusion_common.proto Added compression_level and format_array fields to JsonOptions protobuf definition
datafusion/proto-common/src/generated/prost.rs Generated protobuf code with new JsonOptions fields
datafusion/proto-common/src/generated/pbjson.rs Generated JSON serialization code for new fields
datafusion/proto-common/src/to_proto/mod.rs Serialization logic for JsonOptions (has type mismatch bug)
datafusion/proto-common/src/from_proto/mod.rs Deserialization logic for JsonOptions (missing compression_level field)
datafusion/proto/src/generated/datafusion_proto_common.rs Duplicate generated protobuf code for JsonOptions
datafusion/proto/src/logical_plan/file_formats.rs Proto conversion for JsonOptions with correct type casting
datafusion/common/src/config.rs Added compression_level and format_array to JsonOptions config
datafusion/datasource-json/src/source.rs Implemented JSON array parsing logic with memory-based approach
datafusion/datasource-json/src/file_format.rs Added schema inference for JSON array format
datafusion/datasource-json/Cargo.toml Added serde_json dependency
datafusion/core/src/datasource/file_format/options.rs Added format_array option to NdJsonReadOptions
datafusion/core/src/datasource/file_format/json.rs Comprehensive unit tests for array format functionality
datafusion/core/tests/data/json_array.json Test data file with JSON array format
datafusion/core/tests/data/json_empty_array.json Test data file with empty JSON array
datafusion/sqllogictest/test_files/json.slt Integration tests for JSON array format
datafusion-examples/examples/csv_json_opener.rs Updated example to pass new format_array parameter
Cargo.lock Updated with serde_json dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhuqi-lucas zhuqi-lucas merged commit 8583685 into branch-51 Jan 22, 2026
56 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants