Fix L2 cache bandwidth metrics for MI350#843
Conversation
| per normalization unit. This is expected to be the sum of global/generic and | ||
| spill/stack atomics in the address processor. | ||
| Write Ack Instructions: The total number of write acknowledgements submitted by | ||
| data-return unit to SQ not SP, summed over all compute units on the accelerator, |
There was a problem hiding this comment.
Is the mention of not SP just to match rocprof-sdk's description?
There was a problem hiding this comment.
That is right, see the description for TD_WRITE_ACKT_WAVEFRONT_sum in https://raw.githubusercontent.com/ROCm/rocprofiler-sdk/refs/heads/amd-mainline/source/share/rocprofiler-sdk/counter_defs.yaml
There was a problem hiding this comment.
@feizheng10 , is it better to say SQ instead of SQ not SP ?
There was a problem hiding this comment.
Looks good to me just saying SQ
| max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom)) | ||
| unit: (Req + $normUnit) | ||
| Read Bandwidth - PCIe: | ||
| avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom) |
There was a problem hiding this comment.
@feizheng10, I recall we briefly talked about bandwidth calculations should be over (End_Timestamp - Start_Timestamp) to have the unit GB/s. Should it be the case here?
There was a problem hiding this comment.
Users can set --normal-unit per_second to get Bytes per second.
Default is Bytes per kernel where denominator = 1
Currently following bandwidth metrics use normalization unit as denominator:
- L1I-L2 Bandwidth (1302)
- sL1D-L2 BW (1403)
- Cache BW (1603)
- L1-L2 BW (1603)
- Bandwidth (1703)
- Read / Write / Atomic Bandwidth (1703)
- Read / Write / Atomic Bandwidth - Infinity Fabric / HBM / PCIe (1706)
@feizheng10 , are you saying that all of the above Bandwidth metrics should be Gbps instead of Bytes per Kernel (default)? And this is for all gfx archs?
To be honest, I agree that all of these should be Gbps.
There was a problem hiding this comment.
Thanks for Xuan raising the question.
Let's look into the details. As always, we don't like it but there is always special case:)
So technically, the unit for all BW should be *B/s. And for all of them, I'd say the valid normalization unit can be applied should be: per_kernel and per_wave. per_cycle and per_second don't make sense for me. Because we are setting per_kernel as default, the only another choice for user would be per_wave, which is not that common for the public users.
Basically, we are adding another "per" customized layer on top of GB per second.
Literally, keeping *B/s makes more sense for the end user.
We need to find a proper place/way to remind them, hi, this "GB/s (per_wave)". May be later.
Thought?
There was a problem hiding this comment.
I agree, all of the above listed metrics for all gfx archs should have Unit = Gbps per normalization unit instead of Bytes per normalization unit
Denominator for these metrics should be ((Start_Timestamp - End_Timestamp) / $denom)
Description of the metrics should also be updated.
I think users after reading the Unit and Description will not use --normal-unit per-second because Gbps per second does not make any sense!
There was a problem hiding this comment.
We discussed that all bandwidth metrics should have the unit Gbps.
I will make this change in a separate PR
ec3b1db to
3a83179
Compare
3a83179 to
4297e8e
Compare
* Fix L2 cache bandwidth metrics for MI350 * Address review comments [ROCm/rocprofiler-compute commit: b349e40]
rocprofiler-compute Pull Request
Related Issue
What type of PR is this? (check all that apply)
Technical Details
Fix L2 cache bandwidth metrics for L2 cache for MI350
Have you added or updated tests to validate functionality?
Added / Updated documentation?
Have you updated CHANGELOG?