Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File watchers might not be handled properly causing gradual increase in CPU/Memory usage #4381

Open
uristernik opened this issue Jan 15, 2024 · 5 comments
Assignees

Comments

@uristernik
Copy link

Describe the bug

Fluentd tail plugin was outputting If you keep getting this message, please restart Fluentd. After coming across #3614, we implemented the workaround suggested there.

  • changed follow_inodes to true
  • set rotate_wait to 0

Since than we are not seeing the original If you keep getting this message, please restart Fluentd but still seeing lots of Skip update_watcher because watcher has been already updated by other inotify event.
This is paired with a pattern of memory leaking and gradual increase in CPU usage until a restart occurs.
image

To mitigate this I added pos_file_compaction_interval 20m as suggested here but this had no affect on the resource usage.

image

Related to #3614. More specifically #3614 (comment)

The suspicion is that some Watchers are not handled properly thus leaking and increasing CPU/Memory consumption until the next restart.

To Reproduce

Deploy fluentd (version v1.16.3-debian-forward-1.0) as a daemonset in a dynamic kubernetes cluster. Cluster is consisting of 50-100 nodes. This is the fluentd config:

Expected behavior

CPU / Memory should stay stable.

Your Environment

- Fluentd version: [v1.16.3-debian-forward-1.0](https://github.com/fluent/fluentd-kubernetes-daemonset#:~:text=debian%2Dcloudwatch%2D1-,Forward,-docker%20pull%20fluent)

Your Configuration

<source>
  @type tail
  @id in_tail_container_logs
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  follow_inodes true
  rotate_wait 0
  exclude_path ["/var/log/containers/fluentd*.log", "/var/log/containers/*kube-system*.log", "/var/log/containers/*calico-system*.log", "/var/log/containers/prometheus-node-exporter*.log", "/var/log/containers/opentelemetry-agent*.log"]
  pos_file_compaction_interval 20m
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key time
      time_type string
      time_format "%Y-%m-%dT%H:%M:%S.%NZ"
      keep_time_key true
    </pattern>
    <pattern>
      format /^(?<time>.+?) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.+)$/
      time_format "%Y-%m-%dT%H:%M:%S.%N%:z"
    </pattern>
  </parse>
  emit_unmatched_lines true
</source>


### Your Error Log

```shell
Skip update_watcher because watcher has been already updated by other inotify event

Additional context

#3614

@daipom
Copy link
Contributor

daipom commented Jan 16, 2024

Thanks for your report!

Fluentd tail plugin was outputting If you keep getting this message, please restart Fluentd. After coming across #3614, we implemented the workaround suggested there.

  • changed follow_inodes to true
  • set rotate_wait to 0

So, follow_inodes false has a similar issue.
Could you please report an issue of follow_inodes false in a new issue?

@uristernik
Copy link
Author

@daipom In this case I had follow_inodes true

Do you want me to open a new issue just for tracking?

@daipom
Copy link
Contributor

daipom commented Jan 23, 2024

@uristernik
Wasn't there a problem with follow_inodes false as well?
I'd like to sort out each of follow_inodes false problem and follow_inodes true problem.

I'd like to know if there is any difference between follow_inodes false and follow_inodes true.
For example, whether the same resource leakage occurs when follow_inodes false.

If there is no particular difference, we are fine with this for now.
Thanks!

@kenhys kenhys added waiting-for-user Similar to "moreinfo", but especially need feedback from user and removed waiting-for-triage labels Jul 22, 2024
@shadowshot-x
Copy link

We are facing the same issue.

Error Message Skip update_watcher because watcher has been already updated by other inotify event path="/usr/local/logs/app/app.log" inode=20617294 inode_in_pos_file=0

We are using

read_from_head true
rotate_wait 30
follow_inodes true
enable_stat_watcher false

Memory keeps on gradually growing too! Any resolution on this?

@daipom
Copy link
Contributor

daipom commented Oct 2, 2024

@shadowshot-x Sorry for my late response. Thanks for your report.
Could you please share the Fluentd (td-agent/fluent-package) version and OS?

@daipom daipom added work-in-progress and removed waiting-for-user Similar to "moreinfo", but especially need feedback from user labels Oct 2, 2024
@daipom daipom moved this to To-Do in Fluentd Kanban Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To-Do
Development

No branches or pull requests

4 participants