Analytical FIFO sizing #1185

lstasytis · 2024-09-18T13:21:06Z

FIFO sizing is an extremely time-consuming process in terms of CPU cycles due to currently requiring RTL simulation of a model to determine the FIFO depths by tracking the behavior of the model.

Currently, FINN uses two main approaches for FIFO sizing:

The global sizing algorithm which incrementally tightens FIFO sizes while rerunning RTLSIM until a steady-state is reached of the entire model. (AutoFIFOSizingMethod.LARGEFIFO_RTLSIM)
The characteristic-function based approach which RTL simulates individual nodes and constructs characteristic functions for each one and then uses phase-shifting to determine an optimal FIFO size by finding how many cycles the input characteristic function must be shifted forward before it reaches a steady state with the output stream. (AutoFIFOSizingMethod.CHARACTERIZE)
Between these two options, the later - characteristic sizing can be dramatically sped up by computing the characteristic functions of individual nodes analytically, rather than by using RTLSIM.

This can be accomplished by manually reviewing the RTL (or HLS) code of each node and constructing a model python function which reproduces the loop behavior of only the AXI stream reads and writes, filling up the characteristic function array.

The PR includes these python-based model functions as part of the custom_ops classes of the following nodes:

channelwise_op
convolutioninputgenerator
fmpadding
labelselect
matrixvectoractivation
pool
streamingdatawidthconverter (generalized variant, very conservative estimate)
streamingmaxpool
thresholding
vectorvectoractivation

Additionally, it includes modifications to /builder/build_dataflow_config.py and builder/build_dataflow_steps.py so that vivado is not called in the FIFO sizing step unless there is no characteristic function for an a node (and in that case it called to only characterize that respective node). This is achieved by introducing a new 'ipgen_ignore' node argument, which is set to true for all analytically characterized nodes once FIFO sizing is started and will force the FINN compiler to skip calling Vivado. This argument set back to false, allowing to call Vivado once the analytic FIFO sizing is finished.

Improvements to be made:

The remaining nodes in FINN should be characterized as necessary
There might exist parameter configurations for the convolutioninputgenerator and streamingmaxpool nodes where the characteristic function is inaccurate since an exhaustive search is complicated to do automatically.
Currently, the test for fifo sizing tests for exact correctness of the analytical function relative to the RTLSIM output. However, small latencies introduced at the start or end of the characteristic function by HLS do not lead to a change in final FIFO sizes. The test should be changed to compare the characteristic functions in a more relaxed manner. This would then allow to also perform an exhaustive test of all possible configurations for the nodes.
The characteristic FIFO sizing algorithm lacks support for branching networks. This is irrespective of the individual characteristic functions and should be improved in the main FIFO sizing function (transformation/derive_characteristic.py)

…tgenerator, fmpadding, labelselect, matrixvectoractivation, pool, streamingdatawidthconverter (generalized variant, very conservative estimate), streamingmaxpool, thresholding, vectorvectoractivation

Signed-off-by: lstasytis <[email protected]>

src/finn/custom_op/fpgadataflow/matrixvectoractivation.py

…ing and specific cases of streamingmaxpool and slidingwindow

fpjentzsch

I did not run the tests for thresholding & vvau yet.

In general we probably want to reduce the number of tests that go into the default suite. Currently the most test cases are located in

ConvInpGen: 20.736
Thresholding: 4.608
VVAU: 2.304

Maybe we can still include an extended test suite via pytest.fixture or pytest_generate_tests functions @auphelia?

fpjentzsch · 2024-12-12T15:38:21Z

src/finn/custom_op/fpgadataflow/duplicatestreams.py

@@ -166,12 +166,17 @@ def get_verilog_top_module_intf_names(self):
            )
        return intf_names

-    def derive_characteristic_fxns(self, period):
+    def derive_characteristic_fxns(


This refactoring also needs to be applied to addstreams.py.

fpjentzsch · 2024-12-12T16:04:26Z

src/finn/custom_op/fpgadataflow/duplicatestreams.py

DeriveFIFOSizes expects outFIFODepths to be set for every output of every CustomOp. This is problematic in the current implementation of DuplicateStreams, as the default value of [2] (=1 output with FIFO depth set to 2) is inherited from HWCustomOp, even though DuplicateStreams has NumOutputStreams outputs.

How/where do we set the default value of one attribute depending on another attribute without breaking anything? In verify_node()?

DeriveFIFOSizes should only get called if derive_characteristic_fxns() has been called for each node already, which are kept in the node definition classes. So I would say the optimal place would be in that same function, like such

I'm not sure why this function had 'out1' and 'out2' defined before, was this for a very specific network? I don't think the generalization should break anything.

fpjentzsch · 2024-12-12T16:07:10Z

src/finn/custom_op/fpgadataflow/hwcustomop.py

+            code_gen_dir = self.get_nodeattr("code_gen_dir_ipgen")
+            # ensure that there is a directory
+            if code_gen_dir == "" or not os.path.isdir(code_gen_dir):
+                code_gen_dir = make_build_dir(prefix="code_gen_ipgen_" + str(self.name) + "_")


Should be self.onnx_node.name instead of self.name

fpjentzsch · 2024-12-12T16:07:29Z

src/finn/custom_op/fpgadataflow/hwcustomop.py

+                        # call the compilation function for this node
+                        self.ipgen_singlenode_code()
+                    else:
+                        warnings.warn("Using pre-existing IP for %s" % self.name)


Should be self.onnx_node.name instead of self.name

fpjentzsch · 2024-12-12T16:13:51Z

src/finn/custom_op/fpgadataflow/hwcustomop.py

+                self.code_generation_ipgen(model, fpga_part, clk_period)
+
+            # lazy construction of hlssynthip step
+            if is_hls_node(node):


Don't we need an else branch for this if to support RTL CustomOps? E.g.:

Suggested change

if is_hls_node(node):

if is_hls_node(node):

...

else:

self.prepare_rtlsim()

# ensure that executable path is now set

assert (

self.get_nodeattr("rtlsim_so") != ""

), "Failed to prepare RTLSim, no rtlsim_so attribute found."

Yes, got lost during the refactoring. It actually doesn't need a new branch, but to just be moved out of the is_hls_node() branch, since prepare_rtlsim() should be called in either scenario in this function, is_hls_node() is just an hls-specific pre-step.
Fixed in this commit now

fpjentzsch · 2024-12-12T16:18:17Z

src/finn/custom_op/fpgadataflow/matrixvectoractivation.py

+        if mem_mode in ["internal_decoupled", "external"]:
+            n_weight_inps = self.calc_wmem()
+            # num_w_reps = np.prod(self.get_nodeattr("numInputVectors"))
+            io_dict["inputs"]["weights"] = [0 for i in range(1 * n_weight_inps)]


I don't quite understand why num_w_reps is no longer considered here. Could you explain?

For my experiments I switched to MVAUs with mem_mode "internal_embedded" instead of "internal_decoupled" for now to avoid potential problems.

At some point this was causing

Exception: Error in simulation! Takes too long to produce output. Consider setting the LIVENESS_THRESHOLD env.var. to a larger value.

errors during rtlsim, but this got fixed somewhere along the way and I simply forgot to reintroduce it. Fixed in this commit now.

fpjentzsch · 2024-12-12T17:10:41Z

tests/fpgadataflow/test_fifosizing.py

+        "largefifo_rtlsim_cpp",
+        "characterize_analytic",
+        "characterize_rtl",
+    ],
 )
 @pytest.mark.parametrize("topology", ["tfc", "cnv"])
 def test_fifosizing_linear(method, topology):


After incorporating some of my other comments, this test fails for me in the cases [cnv, characterize_analytic] and [cnv, characterize_rtl] during "step_set_fifo_depths" with

Exception: Error in simulation! Takes too long to produce output. Consider setting the LIVENESS_THRESHOLD env.var. to a larger value.

and

%Warning: ./MVAU_hls_6_Matrix_Vector_Activate_Stream_Batch_p_ZL7threshs_0_ROM_AUTO_1R.dat:0: $readmem file not found

I don't have an explicit LIVENESS_THRESHOLD set, so it should default to 10k.

Does this PR depend on the new auto-folding PR or other PRs?
Could the "internal_decoupled" mode be the culprit (see my other comment)?

With the mvau fix now in, the simulation error does not show up anymore for me, however I still get the

$readmem file not found

warning as well, even though the tests all pass and I manually checked that each type of fifo sizing strategy does generate the same fifo depths across the entire model. This warning is not present in the current /dev build so something is breaking, even though it is not affecting the fifo-sizing. I'm a bit stumped on what is going on here. If the missing mem files are replicas of the weight stream, I guess for the purposes of FIFO sizing, they're not really relevant, since it's parallel streams.

fpjentzsch · 2024-12-12T18:06:32Z

tests/fpgadataflow/test_fpgadataflow_convinputgenerator.py

+@pytest.mark.fpgadataflow
+@pytest.mark.slow
+@pytest.mark.vivado
+def test_fpgadataflow_analytical_characterization_slidingwindow(


These are 20.736 test cases. I only ran 20% of them and over 1.000 failed.

fpjentzsch · 2024-12-12T18:21:18Z

tests/fpgadataflow/test_fpgadataflow_fmpadding.py

+@pytest.mark.fpgadataflow
+@pytest.mark.slow
+@pytest.mark.vivado
+def test_fpgadataflow_analytical_characterization_fmpadding(


For me 32/256 tests failed.

fpjentzsch · 2024-12-12T18:31:00Z

tests/fpgadataflow/test_fpgadataflow_streamingmaxpool.py

+@pytest.mark.fpgadataflow
+@pytest.mark.slow
+@pytest.mark.vivado
+def test_fpgadataflow_analytical_characterization_streamingmaxpool(


For me 20/128 tests failed

adjusting for multiple output streams in duplicatestream

lstasytis and others added 6 commits September 16, 2024 14:22

analytical fifo sizing functions for: channelwise_op, convolutioninpu…

bf872a0

…tgenerator, fmpadding, labelselect, matrixvectoractivation, pool, streamingdatawidthconverter (generalized variant, very conservative estimate), streamingmaxpool, thresholding, vectorvectoractivation

fixed docker scripts

17dffa3

clean up

140ec15

Signed-off-by: lstasytis <[email protected]>

remove run-docker personalization code

4df80fa

Signed-off-by: lstasytis <[email protected]>

faulty base branch update removed

0efe48b

Signed-off-by: lstasytis <[email protected]>

faulty base branch update removed, 2

a6e30e1

Signed-off-by: lstasytis <[email protected]>

lstasytis marked this pull request as ready for review October 15, 2024 16:11

fpjentzsch reviewed Oct 23, 2024

View reviewed changes

src/finn/custom_op/fpgadataflow/matrixvectoractivation.py Outdated Show resolved Hide resolved

auphelia self-requested a review October 24, 2024 08:37

structural overhaul and fmpadding characterization fixes

27407b5

fpjentzsch mentioned this pull request Nov 13, 2024

Characterization-based FIFO sizing for non-linear models #1229

Draft

moved fifo sizing tests to respective node tests, failing = threshold…

bdc3912

…ing and specific cases of streamingmaxpool and slidingwindow

fpjentzsch requested changes Dec 12, 2024

View reviewed changes

lstasytis added 2 commits January 23, 2025 12:16

initial updates based on Felix's comments

aaa498b

adjusting for multiple output streams in duplicatestream

roll back debugging-changes

2a0f0c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analytical FIFO sizing #1185

Analytical FIFO sizing #1185

lstasytis commented Sep 18, 2024

fpjentzsch left a comment

fpjentzsch Dec 12, 2024

fpjentzsch Dec 12, 2024

lstasytis Jan 23, 2025 •

edited

Loading

fpjentzsch Dec 12, 2024

fpjentzsch Dec 12, 2024

fpjentzsch Dec 12, 2024

lstasytis Jan 23, 2025

fpjentzsch Dec 12, 2024

lstasytis Jan 23, 2025 •

edited

Loading

fpjentzsch Dec 12, 2024

lstasytis Jan 23, 2025 •

edited

Loading

fpjentzsch Dec 12, 2024

fpjentzsch Dec 12, 2024

fpjentzsch Dec 12, 2024

-            if is_hls_node(node):
+            if is_hls_node(node):
+               ...
+            else:
+                self.prepare_rtlsim()
+                # ensure that executable path is now set
+                assert (
+                    self.get_nodeattr("rtlsim_so") != ""
+                ), "Failed to prepare RTLSim, no rtlsim_so attribute found."

Analytical FIFO sizing #1185

Are you sure you want to change the base?

Analytical FIFO sizing #1185

Conversation

lstasytis commented Sep 18, 2024

fpjentzsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lstasytis Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lstasytis Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lstasytis Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lstasytis Jan 23, 2025 •

edited

Loading

lstasytis Jan 23, 2025 •

edited

Loading

lstasytis Jan 23, 2025 •

edited

Loading