Right now we have basic metrics around the bg defrag task. Sometimes this task does not run. We should have richer metrics around why it was not run if it skipped
- Task started (to check if it never even launches)
- Task skipped - reasons - these can even be labels
Additional: trace logs which can be triggered by an operator to debug a non-running task in real time