IP Caching #99

bwintermann · 2025-09-01T08:10:21Z

It would be useful, especially for larger models, to have cached IP cores, so that we can mostly skip IP-generation steps which, after synthesis itself, take the longest time in the build flow. It is also helpful for certain tests that require repeated simulations with the same IP. The idea was first mentioned in Xilinx#174 and implemented in Xilinx#349 but never merged as far as I can see. We should maybe implement this as well.

Here is a proposal of some changes to the original / some implementation ideas:

Every operator should be cache-able automatically. Since we may not know which node attributes are required, we then use (almost) all attributes as part of the hash key
Marking which operators are cacheable, and with which attributes can be marked in a decorator, something like this: @cache_ip(attrs="..., ..., ..."). To cover the case above, we would simply use @cache_ip().
- There may be cases where caching could require special handling or may not be possible at all. A possibility would be to also provide something like @no_cache and @custom_cache(func=...)
Caching and fetching cached IPs is part of the existing generation steps (such as HLSSynthIP())
- Which IPs are synthesized, cached and fetched should be logged
As in Implement IP caching Xilinx/finn#349, whether a certain node should be cached / use the cache in a certain flow should be a configurable node attribute.
Whether caching is used overall (for example we likely don't want it during CI runs), is determined via an Env-Var or a BuildFlowConfig argument or a command-line parameter. This should probably default to True.
The location of the cache should be unrelated to (maybe) FINN_DEPS and (definitely) FINN_BUILD_DIR, since we want it to be usable across runs and maybe independent of the FINN executable. It should be configurable in settings.yaml or per Env-var or per command-line argument, like any other configuration option.
In order to avoid issues with different versions of the same IP which might have different behaviour, the commits of finn-hlslib, finn-plus itself and any other required repository should be part of the hash key
- Changing which attributes are used as part of the key does not affect the functionality, since changed attributes would result in a different hash (but we do have to cache which attribute the value belongs to, otherwise in_width: 10 and out_width: 10 could yield the same hash)
The hash should be generated with a new/safe enough hashfunction. Even if very very unlikely, a collision would be incredibly hard to debug, or even catch in the first place.
The cache itself could be organized as a tree-structure of directories, using the first n characters of the hex representation as a prefix, etc.
- Each directory would then contain the IP data and a JSON containing the attributes used in the key in human-readable form for debugging / checking.
Internal parameters get cached as well, external don't matter to the IP

@iksnagreb @LinusJungemann @fpjentzsch do you have additions, suggestions, changes, etc.? Once we have a fixed idea I'd start implementing it.

Checks until PR ready:

Automatically run verification after inserting cached IPs to ensure the correct IP was fetched
Cache management system implemented
Cache usage general settings (DataflowBuildConfig)
Rewrite step order to include PrepareIP for RTL nodes
Narrow selection of attributes needed to identify an operator to improve the cache hit rate
(Done, check before merge) Make sure that every custom op that has external parameters that define it (such as internal weights) has these parameters included in its hash!
Update Wiki with configuration options / Readme Feature List

Optional:

Cache per node settings

bwintermann · 2025-09-04T08:39:43Z

Caching should now be enabled out of the box. If no arguments are given, the cache directory is placed in the root of the finn-plus repository as FINN_IP_CACHE. You can cache any operator by adding @cache_ip(attributes=...) above the custom op definition. I will soon start adding tests and mark all operators by default.

There are still some open points to consider:

How do we organize the transformation? It would be much more organic to include PrepareIP() and then merge step_hw_codegen and step_hw_ipgen into one step_hw_ipgen_cached (or something). However then PrepareIP and HLSSynth are always stuck together and we also break compatibility with FINN original if changes are done to these steps. We should discuss this at some point.
- Currently I have a switch in step_hw_ipgen between normal HLSSynthIP() and the cached variant. This should only be temporary though. Ideally, step_hw_codegen is cached too, since it can take quite some time as well, depending on the configuration.
Which hash functions should be usable?
For each op: Which node attributes / parameters are uniquely definining the operator? We should all double check the given attributes for each operator before we merge, just to be sure that nothing is missed. Getting this wrong would lead to long debugging times, when suddenly the wrong IPs get used because we missed an attribute.

fpjentzsch · 2025-09-04T10:27:27Z

Thanks! For operators like the MVAU, we might need to hash the content of the weight/treshold tensors in addition to the CustomOp attributes and finn-plus commit hash.

bwintermann · 2025-09-04T11:14:37Z

For the MVAU_hls i already implemented it somewhat, by hashing np.ascontiguousarray(model.get_initializer(op.onnx_node.input[1])).to_bytes(), but reliably hashing numpy arrays seems to be generally somewhat difficult. I will try to add all tensors that need to be hashed for each component, but we should definitely check those together before merging.

…xternal parameters are considered

bwintermann · 2025-09-04T17:10:10Z

I've now added step_ip_generation which merges PrepareIP and HLSSynthIP together into one step with the cache added, and replaced step_hw_codegen and step_hw_ipgen with it in the default lookup. This way everything remains compatible with the old flow but new flows are encouraged to use the fused step.

Additionally I covered (I think) all operators that use external parameters and added their parameters to the lookup key. Your @iksnagreb ops should now be covered as well.

Ill add some tests and try it out with different models, and then it should be ready to merge.

…part

…ored some functions.

…th standlone thresholds and wrong order of cache application.

…nto feature/ip_cache

github-actions · 2025-10-05T19:28:56Z

📋 Docstring Check Report

Checked files:

src/finn/builder/build_dataflow.py
src/finn/builder/build_dataflow_config.py
src/finn/builder/build_dataflow_steps.py
src/finn/custom_op/fpgadataflow/hls/__init__.py
src/finn/custom_op/fpgadataflow/rtl/__init__.py
src/finn/interface/interface_utils.py
src/finn/interface/run_finn.py
src/finn/transformation/fpgadataflow/ip_cache.py
src/finn/util/deps.py
tests/infrastructure/test_ip_cache.py

❌ Docstring check failed!

Missing Docstrings Details:

📄 src/finn/builder/build_dataflow_steps.py:

- Line 525: function '_make_hls_estimate_report'

📄 src/finn/custom_op/fpgadataflow/hls/init.py:

- Line 1: module 'init.py'
- Line 40: function 'register_custom_op'

📄 src/finn/custom_op/fpgadataflow/rtl/init.py:

- Line 1: module 'init.py'

📄 src/finn/interface/run_finn.py:

- Line 1: module 'run_finn.py'
- Line 38: function '_resolve_module_path'
- Line 141: function 'main_group'
- Line 177: function 'build'
- Line 270: function 'run'
- Line 310: function 'bench'
- Line 341: function 'test'
- Line 354: function 'deps'
- Line 366: function 'update'
- Line 372: function 'config'
- Line 377: function '_command_get_settings'
- Line 387: function 'config_list'
- Line 395: function 'config_get'
- Line 406: function 'config_set'
- Line 426: function 'config_create'
- Line 440: function 'main'

📄 src/finn/transformation/fpgadataflow/ip_cache.py:

- Line 44: function '_ndarray_to_bytes'
- Line 107: function 'wrapper'

📄 src/finn/util/deps.py:

- Line 1: module 'deps.py'

Total missing docstrings: 23

How to Fix:

Please add docstrings to the missing functions, classes, and modules listed above.

Docstring Guidelines:

All modules should have a module-level docstring
All public functions and methods should have docstrings
All private functions should have docstrings
All classes should have docstrings
Use triple quotes (""") for docstrings
Follow PEP 257 conventions

Raw output from docstring checker

Checking 10 changed Python file(s):
❌ Missing docstrings found:

📄 src/finn/builder/build_dataflow_steps.py:
  - Line 525: function '_make_hls_estimate_report'

📄 src/finn/custom_op/fpgadataflow/hls/__init__.py:
  - Line 1: module '__init__.py'
  - Line 40: function 'register_custom_op'

📄 src/finn/custom_op/fpgadataflow/rtl/__init__.py:
  - Line 1: module '__init__.py'

📄 src/finn/interface/run_finn.py:
  - Line 1: module 'run_finn.py'
  - Line 38: function '_resolve_module_path'
  - Line 141: function 'main_group'
  - Line 177: function 'build'
  - Line 270: function 'run'
  - Line 310: function 'bench'
  - Line 341: function 'test'
  - Line 354: function 'deps'
  - Line 366: function 'update'
  - Line 372: function 'config'
  - Line 377: function '_command_get_settings'
  - Line 387: function 'config_list'
  - Line 395: function 'config_get'
  - Line 406: function 'config_set'
  - Line 426: function 'config_create'
  - Line 440: function 'main'

📄 src/finn/transformation/fpgadataflow/ip_cache.py:
  - Line 44: function '_ndarray_to_bytes'
  - Line 107: function 'wrapper'

📄 src/finn/util/deps.py:
  - Line 1: module 'deps.py'

Total missing docstrings: 23

fpjentzsch · 2025-10-05T19:33:36Z

src/finn/builder/build_dataflow_steps.py

+
+def step_ip_generation(model: ModelWrapper, cfg: DataflowBuildConfig) -> ModelWrapper:
+    """Unified step, that does what step_hw_codegen and step_hw_ipgen did before. (With cache!)."""
+    if cfg.use_ip_caching:


step_hw_ipgen has more info logging here than this step. Should be consistent between both (and probably lean towards verbose logging for a critical feature like this).

Do you mean generally more logging during the IP Cache transformation or during the steps?

Wherever it fits best. I think we should always log summary information, like

Caching enabled/disabled (along with global cache settings if enabled)

How many IPs in total are present in the cache

Restoring x out of y layers from cache

Placing z out of y layers in the cache that were not previously cached

And in verbose/debug mode info like this should be printed per-layer.

I think most of this is already in place.

I'll have a look what gets logged when and revise it if necessary.

fpjentzsch · 2025-10-05T19:37:36Z

src/finn/builder/build_dataflow_steps.py

+    if cfg.use_ip_caching:
+        log.info("Using IP cache to fetch generated IPs...")
+        clk = cfg._resolve_hls_clk_period()
+        if clk is None and cfg.cache_hls_clk_period:


I don't think we need this level of error handling if somehow no (HLS) clock is specified. It adds bloat to this already long file and this condition shouldn't be possible, especially with the latest changes to DataflowBuildConfig, where synth_clk_period_ns is no longer Optional[] and has a default value. _resolve_hls_clk_period always falls back to this clock.

The same comment applies to step_ip_generation.

Yes, this was before the Small Fixes PR, so the clocks could still be None, and thus i checked them. Going to remove this.

fpjentzsch · 2025-10-05T19:39:04Z

src/finn/transformation/fpgadataflow/ip_cache.py

+        if sys.platform != "win32":
+            self.max_hash_len = os.pathconf("/", "PC_NAME_MAX")
+            self.max_path_len = os.pathconf("/", "PC_PATH_MAX")
+        else:


We don't need to support Windows at all.

These lines don't impact us much, but I guess I could remove them. Do we have any checks if we are on windows? If not we should maybe explicitly add one to the CLI of FINN+ so that users don't try to run it on Windows in the first place.

I wouldn't bother at all with this. We assume a very specific Ubuntu setup already with heavy tool chains and a long list of apt packages installed already. We can assume every user has read the basic quickstart documentation.

Maybe I'll add it into the frontend rework still. Only a few lines and could potentially help users. But either way that's another topic. I'll remove it here.

src/finn/transformation/fpgadataflow/ip_cache.py

fpjentzsch · 2025-10-05T19:47:09Z

src/finn/transformation/fpgadataflow/ip_cache.py

+            parambytes = _ndarray_to_bytes(model.get_initializer(op.onnx_node.input[1]))
+            array_hash = self.hasher(parambytes).hexdigest()
+            return f"param_hash:{array_hash}\n"
+        elif isinstance(op, (ElementwiseBinaryOperation,)):


Do ElementwiseBinaryOperations really always have 2 initializers @iksnagreb? I don't see any error handling here.

No, they can have up to two initializer inputs, though in practice that should probably not even happen

Thanks for the heads-up, I'll add handling for different cases here.

fpjentzsch · 2025-10-05T19:51:16Z

src/finn/transformation/fpgadataflow/ip_cache.py

+            op.set_nodeattr("ipgen_path", str(ip_dir))
+            op.set_nodeattr("gen_top_module", new_module_name)
+
+        elif issubclass(type(op), HLSBackend):


Just out of curiosity: Do HLS nodes really not need any (ugly) renaming of module names like the RTL nodes do? What if op.onnx_node.name is different in the current model than in the cached version?

This could be an issue. I noticed the RTL side because a layer was re-used during testing, but I didn't have an example that does this with an HLS layer yet. It might still work though, since I am not sure if later transformations care about the name of the IP core. I'll check this today.

fpjentzsch · 2025-10-06T09:32:01Z

tests/infrastructure/test_ip_cache.py

+from finn.transformation.fpgadataflow.specialize_layers import SpecializeLayers
+from finn.util.basic import alveo_part_map
+from finn.util.deps import get_cache_path
+from tests.fpgadataflow.test_fpgadataflow_mvau import make_single_fclayer_modelwrapper


This import doesn't work, at least in the CI where FINN+ is launched as a pip-installed package.

fpjentzsch · 2025-10-06T09:46:46Z

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.

Is it possible we are running into race conditions with this caching system if multiple tests or builds or NodeLocalTransformations within a graph run simultaneously?

bwintermann · 2025-10-06T11:25:02Z

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.

Is it possible we are running into race conditions with this caching system if multiple tests or builds or NodeLocalTransformations within a graph run simultaneously?

I ran into this before as well. It should not be a race condition as far as I am aware. I think the error originates from the fact that there simply is no tests module in the environment. In pyproject.toml we add tests to the finn package. Maybe changing it to finn.tests works. I'll look into it.

LinusJungemann · 2025-10-06T12:01:38Z

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.
Is it possible we are running into race conditions with this caching system if multiple tests or builds or NodeLocalTransformations within a graph run simultaneously?

I ran into this before as well. It should not be a race condition as far as I am aware. I think the error originates from the fact that there simply is no tests module in the environment. In pyproject.toml we add tests to the finn package. Maybe changing it to finn.tests works. I'll look into it.

Yes, this is an issue I also ran into in #113. Using finn.tests fixed it for me.

fpjentzsch · 2025-10-06T12:07:39Z

src/finn/transformation/fpgadataflow/ip_cache.py

+        if attributes is not None:
+            CACHE_IP_DEFINITIONS[op_cls]["use"] = attributes
+        else:
+            # List of fields that don't define the IP core itself,


I think it should be safe to add the following attributes to this list of default ignored attributes. Most of them are optional (set to "" or [] by default) attributes of HWCustomOp, but they currently still show up in the key file.

backend
preferred_impl_style
exec_mode
rtlsim_trace
slr
mem_port
partition_id
device_id
inFIFODepths
outFIFODepths
output_hook
io_chrc_in
io_chrc_out
io_chrc_period
io_chrc_pads_in
io_chrc_pads_out

fpjentzsch · 2025-10-06T12:57:46Z

src/finn/transformation/fpgadataflow/ip_cache.py

+        # Prepare some always needed values
+        # FINN Commit
+        self.finn_commit = subprocess.run(
+            shlex.split("git rev-parse HEAD"),


This doesn't work in the CI, where FINN+ is installed as package and not from the repo. I.e., I get the following and the FINN+ commit does not contribute to the key, since there is no sanity check:

FINN Commit reads: (authored at: )
HLSLIB Commit reads: 5dde96382b84979c6caa6f34cdad2ac72fa28489 (authored at: 2025-07-14 11:35:11 +0200)

Can we use "poetry version" or some pip command to use the version string as key instead? It should be PEP 440 compliant, i.e., contain the latest short commit hash.

fpjentzsch · 2025-10-06T13:22:33Z

Currently, the caching does not save any time, because step_set_fifo_depths happens before step_ip_generation by default. This step needs to perform PrepareIP()/HLSSynthIP() if auto FIFO sizing is enabled, and currently does so even if it is disabled (see here, although this could be removed).

We likely need to split up step_set_fifo_depths or at least move it behind step_ip_generation again, let's discuss offline...

This issue has been previously discussed here:
#85
Xilinx#1185 (comment)

fpjentzsch · 2025-10-06T13:53:52Z

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.

When I ran the test suite a second time to test the effects of caching, I got 6 more failures than on the first run:

1 more variant of test_split_large_fifos (now all variants affected)
3 more variants of test_fifosizing_linear (now all variants affected)
test_end2end_build_dataflow_directory
the third cybersecurity notebook

Might be worth looking into these.

bwintermann and others added 4 commits September 1, 2025 09:14

Add first cache implementation related file

5f173ba

Initial commit for implementingIP caching

5587203

Fixed several path bugs

e549108

Infrastructure changes for cached IPs

9b74125

Refactoring the cache class

9d5d782

bwintermann added 3 commits September 4, 2025 13:18

Introduced new unified step for IP generation, updated IPCache

4d33232

Automatically mark all ops cacheable. Implement guards to make sure e…

eea8347

…xternal parameters are considered

Added parameter hashing for all operators that have external parameters

f148987

bwintermann and others added 10 commits September 8, 2025 15:41

Added bug fixes, first test, consideration for the clock and the fpga…

f6c9549

…part

Added hash functions, check max path lengths, cleaned up test, refact…

e763690

…ored some functions.

Refactoring, additional test, docs, readme

bb1bdda

Removing ip_vlnv and gen_top_module from key generation. Fixed bug wi…

8fb44bb

…th standlone thresholds and wrong order of cache application.

Make cache application multithreaded

2d72340

Fixed cache application for RTL nodes

a890ec3

Added some metadata

9d7c2dd

Fix error when calling prepare_finn with missing args

6c3b276

Merge branch 'feature/ip_cache' of github.com:eki-project/finn-plus i…

f6d6a0c

…nto feature/ip_cache

Merge branch 'dev' into feature/ip_cache

4d525cf

This comment was marked as outdated.

Sign in to view

Fix merge errors

9241374

fpjentzsch reviewed Oct 5, 2025

View reviewed changes

src/finn/transformation/fpgadataflow/ip_cache.py Show resolved Hide resolved

fpjentzsch reviewed Oct 5, 2025

View reviewed changes

fpjentzsch added this to FINN+ Feature Tracker Oct 5, 2025

fpjentzsch moved this to In Progress in FINN+ Feature Tracker Oct 5, 2025

fpjentzsch reviewed Oct 6, 2025

View reviewed changes

fpjentzsch marked this pull request as ready for review October 6, 2025 09:47

fpjentzsch reviewed Oct 6, 2025

View reviewed changes

IP Caching #99

Are you sure you want to change the base?

IP Caching #99

Uh oh!

Conversation

bwintermann commented Sep 1, 2025 • edited by fpjentzsch Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwintermann commented Sep 4, 2025

Uh oh!

fpjentzsch commented Sep 4, 2025

Uh oh!

bwintermann commented Sep 4, 2025

Uh oh!

bwintermann commented Sep 4, 2025

Uh oh!

This comment was marked as outdated.

github-actions bot commented Oct 5, 2025

Missing Docstrings Details:

📄 src/finn/builder/build_dataflow_steps.py:

📄 src/finn/custom_op/fpgadataflow/hls/init.py:

📄 src/finn/custom_op/fpgadataflow/rtl/init.py:

📄 src/finn/interface/run_finn.py:

📄 src/finn/transformation/fpgadataflow/ip_cache.py:

📄 src/finn/util/deps.py:

How to Fix:

Docstring Guidelines:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpjentzsch Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpjentzsch commented Oct 6, 2025

Uh oh!

bwintermann commented Oct 6, 2025

Uh oh!

LinusJungemann commented Oct 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fpjentzsch commented Oct 6, 2025

Uh oh!

fpjentzsch commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

bwintermann commented Sep 1, 2025 •

edited by fpjentzsch

Loading

fpjentzsch Oct 6, 2025 •

edited

Loading