Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting and sending to Kafka, but seeing 'brokers are down' error. #10170

Open
Gwave6123 opened this issue Apr 3, 2025 · 0 comments
Open

Comments

@Gwave6123
Copy link

Bug Report

Describe the bug
When sending messages over the kafka fluent-bit output we see error messages in fluent-bit after a while. These messages appear semi-randomly, not to the 30 second cadence of when we send fluent-bit logs to Kafka.

Also, even though we see this error, we do not lose any messages. Everything is successfully reaching out kafka servers.

To Reproduce

  • Example log
[2025/03/24 16:44:54] [error] [output:kafka:kafka.1] fluent-bit#producer-2: [thrd:ssl://(address):9096/bootstrap]: 5/5 brokers are down
[2025/03/24 16:44:55] [error] [output:kafka:kafka.2] fluent-bit#producer-3: [thrd:ssl://(address):9096/bootstrap]: 5/5 brokers are down
[2025/03/24 16:45:02] [error] [output:kafka:kafka.0] fluent-bit#producer-1: [thrd:ssl://(address):9096/bootstrap]: 5/5 brokers are down
[2025/03/24 16:45:15] [error] [output:kafka:kafka.1] fluent-bit#producer-2: [thrd:ssl://(address):9096/bootstrap]: 5/5 brokers are down
[2025/03/24 16:45:15] [error] [output:kafka:kafka.2] fluent-bit#producer-3: [thrd:ssl://(address):9096/bootstrap]: 5/5 brokers are down
  • Kafka Output Example
[OUTPUT]
        Name        kafka
        Match       tachyon.logs.crust.*
        Timestamp_key @timestamp
        Timestamp_format iso8601
        Brokers     (address1), (address2), (address3), (address4), (address5)
        Topics      log-core

        # rdkafka.ssl.certificate.location /etc/ssl
        # rdkafka.ssl.key.location /certs/some.key
        # rdkafka.ssl.ca.location /certs/some-bundle.crt
        rdkafka.security.protocol ssl
        rdkafka.request.required.acks 1
        rdkafka.log.connection.close false
        storage.total_limit_size 5M

        # Timeout settings
        rdkafka.request.timeout.ms 10000
        rdkafka.message.timeout.ms 70000
        rdkafka.connections.max.idle.ms  20000

On the main config:

[SERVICE]
    Flush           30
    Daemon          off
    tls             on
    tls.verify      on
    tls.ca_path     /etc/ssl/

Expected behavior
We expect to not see these errors if all 5 of our servers are running successfully.

Your Environment

  • Version used: 3.0.7 and 3.2.10
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes, GitVersion:"v1.18.2-rc2+k3s1"
  • Server type and version:
  • Operating System and version: Ubuntu 20.04.6 LTS
  • Filters and plugins: [FILTERS]: nest, modify, record_modifier, lua,

Additional context
We were looking at the open connections using net-tools and nsenter -t. What we found was that we would see many open connections as fluent-bit remained active with our 5 brokers. Three inputs using this and 5 brokers meant we were seeing around 15 open connections at a time.

To reduce this we implemented rdkafka.connections.max.idle.ms 20000, which brings the connections back down to 3-4 (for our 3 inputs). However, we see this broker is down error. Increasing this to 70000 gets rid of the error, but increases our connections to 4-6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant