feat: enable kernel irq time accounting #1443

neutralalice · 2026-01-17T11:02:14Z

I am interested in collecting interrupt request metrics within talos. I believe this is the only kernel config definition needed to do so. This definition usually generates a /proc/pressure/irq file which can be ingested in to a tsdb.

Considerations: There is a small performance hit according to sources online, I am unsure if talos has already trialed this config option and evaluated the hit. Some other distributions do build with this option (RHEL notably).

Enables fine grained interrupt request time accounting. Signed-off-by: arita <[email protected]>

dsseng

This change is okay, however it would be interesting to hear the use cases.

Measuring interrupt time is going to have the most impact on things like random IO, and is told to have small yet measurable impact, so it would be nice to know the advantages this has for various use cases

neutralalice · 2026-01-20T21:29:16Z

This change is okay, however it would be interesting to hear the use cases.

Measuring interrupt time is going to have the most impact on things like random IO, and is told to have small yet measurable impact, so it would be nice to know the advantages this has for various use cases

context wise: Generally this is coming from an hpc centered approach. Where we are often interested in both performance, and looking for indicators of possible hardware issues. There's a balance, but for us, we're not necessarily numbers/benchmark chasing.

We've got 3 clusters (generally serving scientific workloads for "climate/ hazards modeling")

traditional hpc cluster(no k8s) backed by slurm where we make heavy usage of MPI. Often we are solving the inverse problem. The workloads here tend to be a huge variety where some are network(infinband) and storage(GPFS) intensive, and others just need large amounts of compute/memory. Usually we are memory bound, but some workloads are cpu bound, and it is difficult at times to differentiate if it's actually the cpu not keeping up because of context switching or not. We have IRQ monitoring here and have at times used it to recommend changes to code.
typical k8s cluster on rhel, serving out kubeflow. We don't tend to have a lot of issue with this cluster, but it has infiniband/ storage attachments as above. We do have IRQ monitoring here. The same people as above are often writing new workflows (heavy batch schedule jobs and reflex jobs) to go in here. Overall our main problem is with provisioning/pulling nodes, which is where 3 hopefully changes some stuff for us
skunkworks talos cluster, This is really in an evaluation stage and having IRQ metrics available wouldn't make or break us using talos, it's really just for helping us guide possible optimizations for the end scientists to look at

neutralalice · 2026-01-20T22:31:45Z

It's also worth noting and evaluating the content of the ongoing refactor in this space as well - https://lore.kernel.org/all/[email protected]/

feat: enable kernel irq time accounting

0ebf109

Enables fine grained interrupt request time accounting. Signed-off-by: arita <[email protected]>

talos-bot added this to Planning Jan 17, 2026

github-project-automation bot moved this to To Do in Planning Jan 17, 2026

talos-bot moved this from To Do to In Review in Planning Jan 17, 2026

frezbo requested a review from dsseng January 19, 2026 07:26

dsseng requested a review from smira January 20, 2026 16:13

dsseng reviewed Jan 20, 2026

View reviewed changes

smira moved this from In Review to On Hold in Planning Jan 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable kernel irq time accounting #1443

feat: enable kernel irq time accounting #1443

Uh oh!

neutralalice commented Jan 17, 2026

Uh oh!

dsseng left a comment

Uh oh!

neutralalice commented Jan 20, 2026 •

edited

Loading

Uh oh!

neutralalice commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: enable kernel irq time accounting #1443

Are you sure you want to change the base?

feat: enable kernel irq time accounting #1443

Uh oh!

Conversation

neutralalice commented Jan 17, 2026

Uh oh!

dsseng left a comment

Choose a reason for hiding this comment

Uh oh!

neutralalice commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neutralalice commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

neutralalice commented Jan 20, 2026 •

edited

Loading