File watchers might not be handled properly causing gradual increase in CPU/Memory usage #4381

uristernik · 2024-01-15T09:42:27Z

Describe the bug

Fluentd tail plugin was outputting If you keep getting this message, please restart Fluentd. After coming across #3614, we implemented the workaround suggested there.

changed follow_inodes to true
set rotate_wait to 0

Since than we are not seeing the original If you keep getting this message, please restart Fluentd but still seeing lots of Skip update_watcher because watcher has been already updated by other inotify event.
This is paired with a pattern of memory leaking and gradual increase in CPU usage until a restart occurs.

To mitigate this I added pos_file_compaction_interval 20m as suggested here but this had no affect on the resource usage.

Related to #3614. More specifically #3614 (comment)

The suspicion is that some Watchers are not handled properly thus leaking and increasing CPU/Memory consumption until the next restart.

To Reproduce

Deploy fluentd (version v1.16.3-debian-forward-1.0) as a daemonset in a dynamic kubernetes cluster. Cluster is consisting of 50-100 nodes. This is the fluentd config:

Expected behavior

CPU / Memory should stay stable.

Your Environment

- Fluentd version: [v1.16.3-debian-forward-1.0](https://github.com/fluent/fluentd-kubernetes-daemonset#:~:text=debian%2Dcloudwatch%2D1-,Forward,-docker%20pull%20fluent)

Your Configuration

<source>
  @type tail
  @id in_tail_container_logs
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  follow_inodes true
  rotate_wait 0
  exclude_path ["/var/log/containers/fluentd*.log", "/var/log/containers/*kube-system*.log", "/var/log/containers/*calico-system*.log", "/var/log/containers/prometheus-node-exporter*.log", "/var/log/containers/opentelemetry-agent*.log"]
  pos_file_compaction_interval 20m
  <parse>
    @type multi_format
    <pattern>
      format json
      time_key time
      time_type string
      time_format "%Y-%m-%dT%H:%M:%S.%NZ"
      keep_time_key true
    </pattern>
    <pattern>
      format /^(?<time>.+?) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.+)$/
      time_format "%Y-%m-%dT%H:%M:%S.%N%:z"
    </pattern>
  </parse>
  emit_unmatched_lines true
</source>



### Your Error Log

```shell
Skip update_watcher because watcher has been already updated by other inotify event

Additional context

#3614

The text was updated successfully, but these errors were encountered:

daipom · 2024-01-16T09:16:50Z

Thanks for your report!

Fluentd tail plugin was outputting If you keep getting this message, please restart Fluentd. After coming across #3614, we implemented the workaround suggested there.

changed follow_inodes to true

set rotate_wait to 0

So, follow_inodes false has a similar issue.
Could you please report an issue of follow_inodes false in a new issue?

uristernik · 2024-01-23T09:23:29Z

@daipom In this case I had follow_inodes true

Do you want me to open a new issue just for tracking?

daipom · 2024-01-23T10:07:41Z

@uristernik
Wasn't there a problem with follow_inodes false as well?
I'd like to sort out each of follow_inodes false problem and follow_inodes true problem.

I'd like to know if there is any difference between follow_inodes false and follow_inodes true.
For example, whether the same resource leakage occurs when follow_inodes false.

If there is no particular difference, we are fine with this for now.
Thanks!

shadowshot-x · 2024-09-24T15:45:05Z

We are facing the same issue.

Error Message Skip update_watcher because watcher has been already updated by other inotify event path="/usr/local/logs/app/app.log" inode=20617294 inode_in_pos_file=0

We are using

read_from_head true
rotate_wait 30
follow_inodes true
enable_stat_watcher false

Memory keeps on gradually growing too! Any resolution on this?

daipom · 2024-10-02T06:56:53Z

@shadowshot-x Sorry for my late response. Thanks for your report.
Could you please share the Fluentd (td-agent/fluent-package) version and OS?

uristernik added the waiting-for-triage label Jan 15, 2024

daipom self-assigned this Jan 16, 2024

kenhys added waiting-for-user Similar to "moreinfo", but especially need feedback from user and removed waiting-for-triage labels Jul 22, 2024

daipom added this to Fluentd Kanban Oct 2, 2024

daipom added work-in-progress and removed waiting-for-user Similar to "moreinfo", but especially need feedback from user labels Oct 2, 2024

daipom moved this to To-Do in Fluentd Kanban Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File watchers might not be handled properly causing gradual increase in CPU/Memory usage #4381

File watchers might not be handled properly causing gradual increase in CPU/Memory usage #4381

uristernik commented Jan 15, 2024

daipom commented Jan 16, 2024

uristernik commented Jan 23, 2024

daipom commented Jan 23, 2024

shadowshot-x commented Sep 24, 2024

daipom commented Oct 2, 2024

File watchers might not be handled properly causing gradual increase in CPU/Memory usage #4381

File watchers might not be handled properly causing gradual increase in CPU/Memory usage #4381

Comments

uristernik commented Jan 15, 2024

Describe the bug

To Reproduce

Expected behavior

Your Environment

Your Configuration

Additional context

daipom commented Jan 16, 2024

uristernik commented Jan 23, 2024

daipom commented Jan 23, 2024

shadowshot-x commented Sep 24, 2024

daipom commented Oct 2, 2024