Replies: 1 comment 3 replies
-
|
I am not sure about the implementation constraints (the RTL code), probably @preusser is most qualified to comment on these. However, I can see that this is kind of an obvious optimization to implement - to not generate per-channel parameter code if the parameters have only per-tensor granularity (and I think this should be possible somehow? also, the referenced PR only applies to the binary case where numSteps=1, or am I reading the code wrong?). From a purely experimental point of view (some experiments we recently did), I can tell you that this broadcasting is not as bad as it seems: If there is some redundancy or structure in the distribution of threshold values which could be exploited, the synthesis, i.e., Vivado, usually picks this up. Even if the code generation produces numChannels × numSteps threshold values - if they all turn out to be the same, Vivado should produce some optimized implementation much closer to 1 x numSteps, in terms of resource utilization. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
If I had a Thresholding layer with parameters (numChannels = 16, numSteps = 255), but all the threshold values were identical across channels for the same step number (i.e., only numSteps distinct values in total), would it be possible—either in an HLS implementation or in an RTL implementation—to synthesize only numSteps values instead of numChannels × numSteps?
From #1002, I understand that this wouldn't be possible, as it mentions that the Thresholding RTL module expects a per-channel scale. However, I'm wondering if this may have changed in more recent updates—has this kind of shared-threshold optimization been supported since then?
Beta Was this translation helpful? Give feedback.
All reactions