Skip to content

Add Apache Iceberg Read/Write support #17328

Open
@metalshanked

Description

@metalshanked

Description:

Currently, Logstash supports a wide variety of output plugins, enabling users to send processed data to various destinations like Elasticsearch, S3, Kafka, and more. However, there is no native support for writing data directly to Apache Iceberg tables. This feature request proposes the development of a new output plugin for Logstash that allows users to write data to Iceberg tables, facilitating the creation of data lakehouses and unified streaming/batch data pipelines.

Motivation:

Apache Iceberg has become a leading open-source table format for large, analytical datasets. It offers significant advantages over traditional file-based data lakes, including:

Example Configuration

output {
  iceberg {
    catalog_type => "glue"
    catalog_uri  => "aws_region=us-east-1" # Simplified for example
    aws_access_key_id => "YOUR_ACCESS_KEY"
    aws_secret_access_key => "YOUR_SECRET_KEY"
    warehouse_location => "s3://my-iceberg-bucket/data/"
    table_namespace => "logs"
    table_name => "web_server_logs"
    create_table => true
    # schema_definition => '{ "type": "record", "name": "LogEvent", "fields": [...] }'  # Optional
    partition_columns => ["timestamp"]
    partition_strategy => "daily"
    write_mode => "append"
    file_format => "parquet"
  }
}

Alternatives Considered:

Using Existing Output Plugins (e.g., S3) + External Tools: It's possible to write data to S3 using Logstash's S3 output plugin and then use separate tools (e.g., Spark, Flink) to create and manage Iceberg tables. However, this approach adds complexity, requires managing multiple tools, and doesn't provide the same level of integration and ease of use as a dedicated Iceberg output plugin.

Community Plugins: While there might be community-developed plugins, an officially supported plugin in the main Logstash distribution would ensure better maintenance, testing, and compatibility.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions