Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock wait timeout exceeded for state_history when using retention #893

Open
oxzi opened this issue Feb 25, 2025 · 0 comments
Open

Lock wait timeout exceeded for state_history when using retention #893

oxzi opened this issue Feb 25, 2025 · 0 comments

Comments

@oxzi
Copy link
Member

oxzi commented Feb 25, 2025

While working on another issue with a larger test instance, I ran into a database locking issue after enabling the Icinga DB retention.

After adding the following retention block to my config, the problems started.

retention:
  history-days: 1
  sla-days: 1

Shortly after restarting Icinga DB, the following error appeared, always for the state_history table.

2025-02-25T10:58:39.859Z        FATAL   icingadb        Error 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
can't perform "INSERT INTO \"state_history\" (\"previous_soft_state\", \"check_attempt\", \"check_source\", \"scheduling_source\", \"environment_id\", \"state_type\", \"id\", \"previous_hard_state\", \"output\", \"max_check_attempts\", \"object_type\", \"event_time\", \"endpoint_id\", \"soft_state\", \"long_output\", \"host_id\", \"service_id\", \"hard_state\") VALUES (:previous_soft_state,:check_attempt,:check_source,:scheduling_source,:environment_id,:state_type,:id,:previous_hard_state,:output,:max_check_attempts,:object_type,:event_time,:endpoint_id,:soft_state,:long_output,:host_id,:service_id,:hard_state) ON DUPLICATE KEY UPDATE \"id\" = VALUES(\"id\")"
github.com/icinga/icinga-go-library/database.CantPerformQuery
        github.com/icinga/[email protected]/database/utils.go:16
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2.1
        github.com/icinga/[email protected]/database/db.go:535
github.com/icinga/icinga-go-library/retry.WithBackoff
        github.com/icinga/[email protected]/retry/retry.go:65
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
        github.com/icinga/[email protected]/database/db.go:530
golang.org/x/sync/errgroup.(*Group).Go.func1
        golang.org/x/[email protected]/errgroup/errgroup.go:78
runtime.goexit
        runtime/asm_amd64.s:1700
retry deadline exceeded
github.com/icinga/icinga-go-library/retry.WithBackoff
        github.com/icinga/[email protected]/retry/retry.go:100
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
        github.com/icinga/[email protected]/database/db.go:530
golang.org/x/sync/errgroup.(*Group).Go.func1
        golang.org/x/[email protected]/errgroup/errgroup.go:78
runtime.goexit
        runtime/asm_amd64.s:1700

Looking at the database process list, a parallel DELETE statement is executed for this table, looking like one generated by Icinga DB's retention part.

MariaDB [icingadb]> show processlist;
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
| Id  | User     | Host            | db       | Command | Time | State    | Info
               | Progress |
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
|  55 | icingadb | localhost:54518 | icingadb | Execute | 3341 | Updating | DELETE FROM state_history WHERE environment_id = ? AND event_time < ?
ORDER BY event_time LIMIT 5000 |    0.000 |
| 144 | icingadb | localhost       | icingadb | Query   |    0 | starting | show processlist
               |    0.000 |
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
2 rows in set (0.000 sec)

Such a crash happened reproducibly; sometimes after five minutes, sometimes after an hour, but eventually it crashed. After disabling the retention part, it worked again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant