Description
With Lustre 2.16.1 I get errors like this in syslog:
May 14 12:20:01 miscmds8 lustre_exporter[2114]: time="2025-05-14T12:20:01+02:00" level=error msg="No valid jobid found in block: job_id: \"bladejb:689.prometheus-node\"\n snapshot_time: 1747217967.543183709 secs.nsecs\n start_time: 1747052362.830109469 secs.nsecs\n elapsed_time: 165604.713074240 secs.nsecs\n open: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n close: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n mknod: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n link: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n unlink: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n mkdir: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n rmdir: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n rename: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n getattr: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n setattr: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n getxattr: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n setxattr: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n statfs: { samples: 5522, unit: usecs, min: 0, max: 19, sum: 8876, sumsq: 20576 }\n sync: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n samedir_rename: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n parallel_rename_file: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n parallel_rename_dir: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n crossdir_rename: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n read: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n write: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n read_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0, sumsq: 0, hist: { } }\n write_bytes: { samples: 0, unit: bytes, min: 0, max: 0, sum: 0, sumsq: 0, hist: { } }\n punch: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n migrate: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n fallocate: { samples: 0, unit: usecs, min: 0, max: 0, sum: 0, sumsq: 0 }\n-"
That's because the format of the job_stats has changed, they have added quotation marks:
[miscmds8] /root # grep bladejb:689.prometheus-node /proc/fs/lustre/mdt/*/job_stats
- job_id: "bladejb:689.prometheus-node"