Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you aware of any ongoing efforts to port jobstats to PBSPro #17

Open
CanWood opened this issue Nov 18, 2024 · 6 comments
Open

Are you aware of any ongoing efforts to port jobstats to PBSPro #17

CanWood opened this issue Nov 18, 2024 · 6 comments

Comments

@CanWood
Copy link

CanWood commented Nov 18, 2024

Hi folks,

I recognize this project is slurm specific and it would be quite a task to overhaul it for PBSPro. Just curious to know if anyone on your team has explored it, or if you're aware of any active forks pursuing a PBSPro port?

Cheers

@plazonic
Copy link
Collaborator

plazonic commented Dec 4, 2024

Hello,

this is the first such request. I am not familiar with PBSPro but if I google it I see that it can use cgroups. Do you know how it does that and if it organizes cgroups similar to what slurm does (i.e. by job and maybe user/step)? E.g. slurm does this for memory cgroup (similar for cpu, if configured to use cgroups for accounting):

/sys/fs/cgroup/memory/slurm/uid_331949/job_1859370/step_0/task_7
/sys/fs/cgroup/memory/slurm/uid_331949/job_1859370/step_0/task_7/cgroup.procs
/sys/fs/cgroup/memory/slurm/uid_331949/job_1859370/step_0/task_7/memory.use_hierarchy
....

and that file path is what we use to parse out jobid of that particular job (and uid, but that's less important) + stats are available in that directory.

@pbisbal1
Copy link

pbisbal1 commented Dec 4, 2024

I currently work at a site that uses PBS Pro with cgroups. When I get a chance, I'll take a look at this and see if I can contribute anything.

@CanWood
Copy link
Author

CanWood commented Dec 4, 2024

Thanks @plazonic and @pbisbal1. Under PBS, the cgroup hierarchy is as follows:

/sys/fs/cgroup/devices/pbspro.service/jobid/27138999.servername
/sys/fs/cgroup/cpuset/pbspro.service/jobid/27138999.servername
/sys/fs/cgroup/memory/pbspro.service/jobid/27138999.servername
/sys/fs/cgroup/cpu,cpuacct/pbspro.service/jobid/27138999.servername
/sys/fs/cgroup/systemd/pbspro.service/jobid/27138999.servername

and under those, just like what in in @plazonic 's post, so the contents of the memory are what you've posted. So no breakdown by uid, step, or task.

Cheers

@plazonic
Copy link
Collaborator

Good morning,

the first step here would be to get the various exporters to parse out jobids out of pbspro. Based on your cgroup paths I've changed our cgroup exporter to try to do that in pbspro branch - could you maybe try compiling that branch (go build) and testing if it fetches the data (with param --config.paths /pbspro.service).

@CanWood
Copy link
Author

CanWood commented Dec 17, 2024

Thanks! That's greatly appreciated.

I compiled that and gave it a go with both "--config.paths /pbspro.service" and also tried "--config.paths /pbspro.service/jobid". I've configured prometheus to scrape but when I try to add metrics in grafana, the only one appearing in my prometheus data sources is "cgroup_exporter_build_info". Are there any other approaches you'd propose I try or any details I can collect that could help diagnose why the expected utilization metrics don't present

@plazonic
Copy link
Collaborator

I wouldn't go that far that quickly - can you test first the exporter itself. That is after compiling and starting with appropriate option (which should be --config.paths /pbspro.service), then either open in browser or wget/curl http://NODENAME:9306/metrics and see if it collects any metrics that start with cgroup_ and if there is a jobid label.

If there isn't add also --log.level=debug option and look for "Got for match" debug messages - it should be trying to parse out jobid out of file path and if it is not there should be a hint on which cgroup dirs it is going through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants