Lock wait timeout exceeded for state_history when using retention #893

oxzi · 2025-02-25T12:08:44Z

While working on another issue with a larger test instance, I ran into a database locking issue after enabling the Icinga DB retention.

After adding the following retention block to my config, the problems started.

retention:
  history-days: 1
  sla-days: 1

Shortly after restarting Icinga DB, the following error appeared, always for the state_history table.

2025-02-25T10:58:39.859Z        FATAL   icingadb        Error 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
can't perform "INSERT INTO \"state_history\" (\"previous_soft_state\", \"check_attempt\", \"check_source\", \"scheduling_source\", \"environment_id\", \"state_type\", \"id\", \"previous_hard_state\", \"output\", \"max_check_attempts\", \"object_type\", \"event_time\", \"endpoint_id\", \"soft_state\", \"long_output\", \"host_id\", \"service_id\", \"hard_state\") VALUES (:previous_soft_state,:check_attempt,:check_source,:scheduling_source,:environment_id,:state_type,:id,:previous_hard_state,:output,:max_check_attempts,:object_type,:event_time,:endpoint_id,:soft_state,:long_output,:host_id,:service_id,:hard_state) ON DUPLICATE KEY UPDATE \"id\" = VALUES(\"id\")"
github.com/icinga/icinga-go-library/database.CantPerformQuery
        github.com/icinga/[email protected]/database/utils.go:16
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2.1
        github.com/icinga/[email protected]/database/db.go:535
github.com/icinga/icinga-go-library/retry.WithBackoff
        github.com/icinga/[email protected]/retry/retry.go:65
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
        github.com/icinga/[email protected]/database/db.go:530
golang.org/x/sync/errgroup.(*Group).Go.func1
        golang.org/x/[email protected]/errgroup/errgroup.go:78
runtime.goexit
        runtime/asm_amd64.s:1700
retry deadline exceeded
github.com/icinga/icinga-go-library/retry.WithBackoff
        github.com/icinga/[email protected]/retry/retry.go:100
github.com/icinga/icinga-go-library/database.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
        github.com/icinga/[email protected]/database/db.go:530
golang.org/x/sync/errgroup.(*Group).Go.func1
        golang.org/x/[email protected]/errgroup/errgroup.go:78
runtime.goexit
        runtime/asm_amd64.s:1700

Looking at the database process list, a parallel DELETE statement is executed for this table, looking like one generated by Icinga DB's retention part.

MariaDB [icingadb]> show processlist;
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
| Id  | User     | Host            | db       | Command | Time | State    | Info
               | Progress |
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
|  55 | icingadb | localhost:54518 | icingadb | Execute | 3341 | Updating | DELETE FROM state_history WHERE environment_id = ? AND event_time < ?
ORDER BY event_time LIMIT 5000 |    0.000 |
| 144 | icingadb | localhost       | icingadb | Query   |    0 | starting | show processlist
               |    0.000 |
+-----+----------+-----------------+----------+---------+------+----------+---------------------------------------------------------------------------------------
---------------+----------+
2 rows in set (0.000 sec)

Such a crash happened reproducibly; sometimes after five minutes, sometimes after an hour, but eventually it crashed. After disabling the retention part, it worked again.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock wait timeout exceeded for state_history when using retention #893

Lock wait timeout exceeded for state_history when using retention #893

oxzi commented Feb 25, 2025

Lock wait timeout exceeded for state_history when using retention #893

Lock wait timeout exceeded for state_history when using retention #893

Comments

oxzi commented Feb 25, 2025