Skip to content

Conversation

ErwanAliasr1
Copy link
Collaborator

This pull request is about bug fixing hwgraph and add a few features to ease the rendering of large datasets.

@ErwanAliasr1 ErwanAliasr1 force-pushed the hwgraph-updates branch 3 times, most recently from 5786e11 to bc67c4b Compare August 29, 2025 16:39
Copy link
Contributor

@anisse anisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On "Clarify directory names". Is scaling easy to understand? Wouldn't something like smp_scaling be more appropriate (it would englobe thread, core or numa domain scaling)?

Copy link
Contributor

@anisse anisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in commit "Adding --same-rack option" : studing -> studying
print -> prints

@anisse
Copy link
Contributor

anisse commented Sep 1, 2025

Again for the --same-rack, I'm not sure I understand why the servers should be in the same rack for this graph? Why does it matter? At least for --same-chassis, we assume they share the PSUs. What do they share in --same-rack? Why can't it be any arbitrary --group-of-servers ?

When plotting large graphs with crossing or nearly parallel lines, using
the "o" marker makes a too large marker on the line. As a consequence,
the graph is hard to read and some part of the lines are nearly
unreadable.

This commit is using the smallest marker which is large enough to be
seen, increasing th readbility of the graphs.

Signed-off-by: Erwan Velu <[email protected]>
The lower-case 's' makes no sense here.

Signed-off-by: Erwan Velu <[email protected]>
@ErwanAliasr1 ErwanAliasr1 force-pushed the hwgraph-updates branch 2 times, most recently from df80e9b to 5c2785e Compare September 4, 2025 15:57
The current directory structure was like :
- <host1>
- <host2>
- <host.n>
- individual
- scaling

This was a bit confusing to read as per the names and structure.

This commit is offering a new scheme like:
- environment / { by_host | by_chassis} / {host1, host2, host.n} / {metric1, metric2, ...} / { graph1, graph2, ...}
- max_versus / { metric1, metric2, ...} / {job1, job2, ...} / { graph1, graph2, ...}
- smp_scaling / {metric1, metric2, ...} / {graph1, graph2, ...}

To match this new semantic, --no-individual argument is renamed --no-versus

Signed-off-by: Erwan Velu <[email protected]>
Some benchmarks are operated with external scenario like:
- shutdown a PSU during a period of time
- remove liquid cooling during a portion of a run
- remove a fan
....

While rendering the graphs, it was not yet possible to indicate specific
period of the benchmark where some external events occured.

This commit adds a --events option to hwgraph to add to the graph some
visual indications that events occured at some period of the benchmark.

Each event is declared like <event_name>:<start_time>:<duration>, a
example looks like the following :
 uv run hwgraph graph
   --traces mydir/results.json:my_server_name:BMC.Server
   --title 'Full CPU load 10 minutes - no coolant for 90 secs'
   --outdir mydir/graphs
   --events no-coolant:0:90

In this example, a background nearly transparent box will be added for
the first 90 seconds (start_time=0s, duration=90s) of the benchmark.
A new entry will be added to the legend : "Event no-coolant"

As per the --traces option, multiple events can be defined, each new
event will be colored differently to ensure an easy reading.

Signed-off-by: Erwan Velu <[email protected]>
Signed-off-by: Erwan Velu <[email protected]>
When too many boxes are rendered at once, the horizontal print is
unreadble.

If more than 10 boxes are rendered, let's switch in vertical mode for
the associated label.

Signed-off-by: Erwan Velu <[email protected]>
When performing full rack benchmarking, it could be interesting to graph
some specific graphs for studying the following metrics during the full
run:
- the sum of the servers power consumption
- the sum of the cpus power consumption
- the sum of difference between server's power and cpu power
  consumptions

This patch also prints the percentage ratio of the cpu power and the
other components versus the server's power consumption.
It could be interesting to see how much of a server power is consumed by the
CPUs.

This is about the same idea as the --same-chassis but at the scale of a
group of servers.

This feature, will be useful to graph results of a given rack.

Signed-off-by: Erwan Velu <[email protected]>
The max_versus mode is useful to compare the same run over several
machines or the same machine in various conditions.

If a single trace is provided, the produced graphs are not very
interesting to read.
Let's disable them and print a message to explain the reason of
this automatic feature disabling.

Signed-off-by: Erwan Velu <[email protected]>
The current version of hwgraph crashes if the trace are missing PDU or
IPC metrics in the trace file.

This patch ignores and report when a metric is missing instead of
crashing with KeyError exceptions.

This improves hwgraph's retrocompatiblity with older traces.

Signed-off-by: Erwan Velu <[email protected]>
The current code was plotting each iteration of the benchmark which
generated tons of nearly useless graphs.

This commit keeps the maximum values and drops intermediates ones.

Signed-off-by: Erwan Velu <[email protected]>
time_to_next_sync() returned the number of seconds before the next
meeting point but also the "next_sync" absolute time.

As the next_sync is not used at all, let's remove is useless return
value and simplify the return code function.

Signed-off-by: Erwan Velu <[email protected]>
@ErwanAliasr1
Copy link
Collaborator Author

repushed sorry, forgot to fix typos you reported :/

Copy link
Contributor

@anisse anisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ErwanAliasr1 ErwanAliasr1 merged commit ed70b42 into main Sep 5, 2025
4 checks passed
@ErwanAliasr1 ErwanAliasr1 deleted the hwgraph-updates branch September 5, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants