Skip to content

Lack of index on inode column in in_tail sqlite DB #10166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
littlecatherine opened this issue Apr 2, 2025 · 0 comments
Open

Lack of index on inode column in in_tail sqlite DB #10166

littlecatherine opened this issue Apr 2, 2025 · 0 comments

Comments

@littlecatherine
Copy link

littlecatherine commented Apr 2, 2025

Bug Report

Describe the bug
I have observed that the SQL schema for the tail input state database does not create an explicit index on the inode column which affects query performance especially when the number of records grows.

#define SQL_CREATE_FILES                                                \
    "CREATE TABLE IF NOT EXISTS in_tail_files ("                        \
    "  id      INTEGER PRIMARY KEY,"                                    \
    "  name    TEXT NOT NULL,"                                          \
    "  offset  INTEGER,"                                                \
    "  inode   INTEGER,"                                                \
    "  created INTEGER,"                                                \
    "  rotated INTEGER DEFAULT 0"                                       \
    ");"

#define SQL_GET_FILE                                                    \
    "SELECT * from in_tail_files WHERE inode=@inode order by id desc;"

To Reproduce

fluent-bit yaml config:

  service:
    flush: 5
    grace: 5
    daemon: "off"
    dns.mode: UDP
    log_level: debug
    http_server: "on"
    http_listen: 0.0.0.0
    http_port: 2020
    coro_stack_size: 24576
    scheduler.cap: 2000
    scheduler.base: 5
    json.convert_nan_to_null: false
    sp.convert_from_str_to_num: true
    Health_Check: "on"
    Hot_Reload: "on"

pipeline:
    inputs:
      - db: /fluent-bit/data/in_tail.db
        name: tail
        path: /logs/sub1/*.log,/logs/sub2/*.log
    outputs:
      - name: stdout

Steps to reproduce the problem:

  • create 6k log files in /logs/sub1/*.log
  • create 400k log files in /logs/sub2/*.log
  • run fluent-bit with debug log level, at the first time run fluent-bit will create a new db, take note how long it takes to append all files in path /logs/sub1/*.log during initialization process
  • let it finish processing all files in /logs/sub2/*.log, then stop fluent-bit
  • run fluent-bit again using the existing db, take note how long it takes this time to append all files in path /logs/sub1/*.log
  • you will find it takes much longer to process the same amount of files as the number of records grows

Example log message:
First run

[2025/03/25 14:31:24] [debug] [input:tail:in.tail.path] scanning path /logs/sub1/*.log
[2025/03/25 14:31:25] [debug] [input:tail:in.tail.path]  file will be read in POSIX_FADV_DONTNEED mode /logs/sub1/1.log
...
[2025/03/25 14:51:10] [debug] [input:tail:in.tail.path] 10000 new files found on path '/logs/sub1/*.log'
[2025/03/25 14:51:10] [debug] [input:tail:in.tail.path] scanning path /logs/sub2/*.log
...

Second run

[2025/03/25 15:41:17] [debug] [input:tail:in.tail.path] scanning path /logs/sub1/*.log
[2025/03/25 15:41:18] [debug] [input:tail:in.tail.path]  file will be read in POSIX_FADV_DONTNEED mode /logs/sub1/1.log
...
[2025/03/25 15:43:12] [debug] [input:tail:in.tail.path] 10000 new files found on path '/logs/sub1/*.log'
[2025/03/25 15:43:12] [debug] [input:tail:in.tail.path] scanning path /logs/sub2/*.log
...

Expected behavior
It should takes approx. same time to append the same amount of files regardless of the number of records in the table

Your Environment

  • Version used: 3.2.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant