Description
Description:
Currently, Logstash supports a wide variety of output plugins, enabling users to send processed data to various destinations like Elasticsearch, S3, Kafka, and more. However, there is no native support for writing data directly to Apache Iceberg tables. This feature request proposes the development of a new output plugin for Logstash that allows users to write data to Iceberg tables, facilitating the creation of data lakehouses and unified streaming/batch data pipelines.
Motivation:
Apache Iceberg has become a leading open-source table format for large, analytical datasets. It offers significant advantages over traditional file-based data lakes, including:
Example Configuration
output {
iceberg {
catalog_type => "glue"
catalog_uri => "aws_region=us-east-1" # Simplified for example
aws_access_key_id => "YOUR_ACCESS_KEY"
aws_secret_access_key => "YOUR_SECRET_KEY"
warehouse_location => "s3://my-iceberg-bucket/data/"
table_namespace => "logs"
table_name => "web_server_logs"
create_table => true
# schema_definition => '{ "type": "record", "name": "LogEvent", "fields": [...] }' # Optional
partition_columns => ["timestamp"]
partition_strategy => "daily"
write_mode => "append"
file_format => "parquet"
}
}
Alternatives Considered:
Using Existing Output Plugins (e.g., S3) + External Tools: It's possible to write data to S3 using Logstash's S3 output plugin and then use separate tools (e.g., Spark, Flink) to create and manage Iceberg tables. However, this approach adds complexity, requires managing multiple tools, and doesn't provide the same level of integration and ease of use as a dedicated Iceberg output plugin.
Community Plugins: While there might be community-developed plugins, an officially supported plugin in the main Logstash distribution would ensure better maintenance, testing, and compatibility.