Skip to content

Conversation

@bwintermann
Copy link

@bwintermann bwintermann commented Sep 1, 2025

It would be useful, especially for larger models, to have cached IP cores, so that we can mostly skip IP-generation steps which, after synthesis itself, take the longest time in the build flow. It is also helpful for certain tests that require repeated simulations with the same IP. The idea was first mentioned in Xilinx#174 and implemented in Xilinx#349 but never merged as far as I can see. We should maybe implement this as well.

Here is a proposal of some changes to the original / some implementation ideas:

  • Every operator should be cache-able automatically. Since we may not know which node attributes are required, we then use (almost) all attributes as part of the hash key
  • Marking which operators are cacheable, and with which attributes can be marked in a decorator, something like this: @cache_ip(attrs="..., ..., ..."). To cover the case above, we would simply use @cache_ip().
    • There may be cases where caching could require special handling or may not be possible at all. A possibility would be to also provide something like @no_cache and @custom_cache(func=...)
  • Caching and fetching cached IPs is part of the existing generation steps (such as HLSSynthIP())
    • Which IPs are synthesized, cached and fetched should be logged
  • As in Implement IP caching Xilinx/finn#349, whether a certain node should be cached / use the cache in a certain flow should be a configurable node attribute.
  • Whether caching is used overall (for example we likely don't want it during CI runs), is determined via an Env-Var or a BuildFlowConfig argument or a command-line parameter. This should probably default to True.
  • The location of the cache should be unrelated to (maybe) FINN_DEPS and (definitely) FINN_BUILD_DIR, since we want it to be usable across runs and maybe independent of the FINN executable. It should be configurable in settings.yaml or per Env-var or per command-line argument, like any other configuration option.
  • In order to avoid issues with different versions of the same IP which might have different behaviour, the commits of finn-hlslib, finn-plus itself and any other required repository should be part of the hash key
    • Changing which attributes are used as part of the key does not affect the functionality, since changed attributes would result in a different hash (but we do have to cache which attribute the value belongs to, otherwise in_width: 10 and out_width: 10 could yield the same hash)
  • The hash should be generated with a new/safe enough hashfunction. Even if very very unlikely, a collision would be incredibly hard to debug, or even catch in the first place.
  • The cache itself could be organized as a tree-structure of directories, using the first n characters of the hex representation as a prefix, etc.
    • Each directory would then contain the IP data and a JSON containing the attributes used in the key in human-readable form for debugging / checking.
  • Internal parameters get cached as well, external don't matter to the IP

@iksnagreb @LinusJungemann @fpjentzsch do you have additions, suggestions, changes, etc.? Once we have a fixed idea I'd start implementing it.

Checks until PR ready:

  • Automatically run verification after inserting cached IPs to ensure the correct IP was fetched
  • Cache management system implemented
  • Cache usage general settings (DataflowBuildConfig)
  • Rewrite step order to include PrepareIP for RTL nodes
  • Narrow selection of attributes needed to identify an operator to improve the cache hit rate
  • (Done, check before merge) Make sure that every custom op that has external parameters that define it (such as internal weights) has these parameters included in its hash!
  • Update Wiki with configuration options / Readme Feature List

Optional:

  • Cache per node settings

@bwintermann
Copy link
Author

Caching should now be enabled out of the box. If no arguments are given, the cache directory is placed in the root of the finn-plus repository as FINN_IP_CACHE. You can cache any operator by adding @cache_ip(attributes=...) above the custom op definition. I will soon start adding tests and mark all operators by default.

There are still some open points to consider:

  • How do we organize the transformation? It would be much more organic to include PrepareIP() and then merge step_hw_codegen and step_hw_ipgen into one step_hw_ipgen_cached (or something). However then PrepareIP and HLSSynth are always stuck together and we also break compatibility with FINN original if changes are done to these steps. We should discuss this at some point.
    • Currently I have a switch in step_hw_ipgen between normal HLSSynthIP() and the cached variant. This should only be temporary though. Ideally, step_hw_codegen is cached too, since it can take quite some time as well, depending on the configuration.
  • Which hash functions should be usable?
  • For each op: Which node attributes / parameters are uniquely definining the operator? We should all double check the given attributes for each operator before we merge, just to be sure that nothing is missed. Getting this wrong would lead to long debugging times, when suddenly the wrong IPs get used because we missed an attribute.

@fpjentzsch
Copy link

Thanks! For operators like the MVAU, we might need to hash the content of the weight/treshold tensors in addition to the CustomOp attributes and finn-plus commit hash.

@bwintermann
Copy link
Author

For the MVAU_hls i already implemented it somewhat, by hashing np.ascontiguousarray(model.get_initializer(op.onnx_node.input[1])).to_bytes(), but reliably hashing numpy arrays seems to be generally somewhat difficult. I will try to add all tensors that need to be hashed for each component, but we should definitely check those together before merging.

@bwintermann
Copy link
Author

I've now added step_ip_generation which merges PrepareIP and HLSSynthIP together into one step with the cache added, and replaced step_hw_codegen and step_hw_ipgen with it in the default lookup. This way everything remains compatible with the old flow but new flows are encouraged to use the fused step.

Additionally I covered (I think) all operators that use external parameters and added their parameters to the lookup key. Your @iksnagreb ops should now be covered as well.

Ill add some tests and try it out with different models, and then it should be ready to merge.

@github-actions

This comment was marked as outdated.

@github-actions
Copy link

github-actions bot commented Oct 5, 2025

📋 Docstring Check Report

Checked files:

  • src/finn/builder/build_dataflow.py
  • src/finn/builder/build_dataflow_config.py
  • src/finn/builder/build_dataflow_steps.py
  • src/finn/custom_op/fpgadataflow/hls/__init__.py
  • src/finn/custom_op/fpgadataflow/rtl/__init__.py
  • src/finn/interface/interface_utils.py
  • src/finn/interface/run_finn.py
  • src/finn/transformation/fpgadataflow/ip_cache.py
  • src/finn/util/deps.py
  • tests/infrastructure/test_ip_cache.py

Docstring check failed!

Missing Docstrings Details:

📄 src/finn/builder/build_dataflow_steps.py:

    • Line 525: function '_make_hls_estimate_report'

📄 src/finn/custom_op/fpgadataflow/hls/init.py:

    • Line 1: module 'init.py'
    • Line 40: function 'register_custom_op'

📄 src/finn/custom_op/fpgadataflow/rtl/init.py:

    • Line 1: module 'init.py'

📄 src/finn/interface/run_finn.py:

    • Line 1: module 'run_finn.py'
    • Line 38: function '_resolve_module_path'
    • Line 141: function 'main_group'
    • Line 177: function 'build'
    • Line 270: function 'run'
    • Line 310: function 'bench'
    • Line 341: function 'test'
    • Line 354: function 'deps'
    • Line 366: function 'update'
    • Line 372: function 'config'
    • Line 377: function '_command_get_settings'
    • Line 387: function 'config_list'
    • Line 395: function 'config_get'
    • Line 406: function 'config_set'
    • Line 426: function 'config_create'
    • Line 440: function 'main'

📄 src/finn/transformation/fpgadataflow/ip_cache.py:

    • Line 44: function '_ndarray_to_bytes'
    • Line 107: function 'wrapper'

📄 src/finn/util/deps.py:

    • Line 1: module 'deps.py'

Total missing docstrings: 23

How to Fix:

Please add docstrings to the missing functions, classes, and modules listed above.

Docstring Guidelines:

  • All modules should have a module-level docstring
  • All public functions and methods should have docstrings
  • All private functions should have docstrings
  • All classes should have docstrings
  • Use triple quotes (""") for docstrings
  • Follow PEP 257 conventions
Raw output from docstring checker
Checking 10 changed Python file(s):
❌ Missing docstrings found:

📄 src/finn/builder/build_dataflow_steps.py:
  - Line 525: function '_make_hls_estimate_report'

📄 src/finn/custom_op/fpgadataflow/hls/__init__.py:
  - Line 1: module '__init__.py'
  - Line 40: function 'register_custom_op'

📄 src/finn/custom_op/fpgadataflow/rtl/__init__.py:
  - Line 1: module '__init__.py'

📄 src/finn/interface/run_finn.py:
  - Line 1: module 'run_finn.py'
  - Line 38: function '_resolve_module_path'
  - Line 141: function 'main_group'
  - Line 177: function 'build'
  - Line 270: function 'run'
  - Line 310: function 'bench'
  - Line 341: function 'test'
  - Line 354: function 'deps'
  - Line 366: function 'update'
  - Line 372: function 'config'
  - Line 377: function '_command_get_settings'
  - Line 387: function 'config_list'
  - Line 395: function 'config_get'
  - Line 406: function 'config_set'
  - Line 426: function 'config_create'
  - Line 440: function 'main'

📄 src/finn/transformation/fpgadataflow/ip_cache.py:
  - Line 44: function '_ndarray_to_bytes'
  - Line 107: function 'wrapper'

📄 src/finn/util/deps.py:
  - Line 1: module 'deps.py'

Total missing docstrings: 23


def step_ip_generation(model: ModelWrapper, cfg: DataflowBuildConfig) -> ModelWrapper:
"""Unified step, that does what step_hw_codegen and step_hw_ipgen did before. (With cache!)."""
if cfg.use_ip_caching:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step_hw_ipgen has more info logging here than this step. Should be consistent between both (and probably lean towards verbose logging for a critical feature like this).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean generally more logging during the IP Cache transformation or during the steps?

Copy link

@fpjentzsch fpjentzsch Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wherever it fits best. I think we should always log summary information, like

  • Caching enabled/disabled (along with global cache settings if enabled)
  • How many IPs in total are present in the cache
  • Restoring x out of y layers from cache
  • Placing z out of y layers in the cache that were not previously cached

And in verbose/debug mode info like this should be printed per-layer.

I think most of this is already in place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look what gets logged when and revise it if necessary.

if cfg.use_ip_caching:
log.info("Using IP cache to fetch generated IPs...")
clk = cfg._resolve_hls_clk_period()
if clk is None and cfg.cache_hls_clk_period:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this level of error handling if somehow no (HLS) clock is specified. It adds bloat to this already long file and this condition shouldn't be possible, especially with the latest changes to DataflowBuildConfig, where synth_clk_period_ns is no longer Optional[] and has a default value. _resolve_hls_clk_period always falls back to this clock.

The same comment applies to step_ip_generation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was before the Small Fixes PR, so the clocks could still be None, and thus i checked them. Going to remove this.

if sys.platform != "win32":
self.max_hash_len = os.pathconf("/", "PC_NAME_MAX")
self.max_path_len = os.pathconf("/", "PC_PATH_MAX")
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to support Windows at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines don't impact us much, but I guess I could remove them. Do we have any checks if we are on windows? If not we should maybe explicitly add one to the CLI of FINN+ so that users don't try to run it on Windows in the first place.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother at all with this. We assume a very specific Ubuntu setup already with heavy tool chains and a long list of apt packages installed already. We can assume every user has read the basic quickstart documentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'll add it into the frontend rework still. Only a few lines and could potentially help users. But either way that's another topic. I'll remove it here.

parambytes = _ndarray_to_bytes(model.get_initializer(op.onnx_node.input[1]))
array_hash = self.hasher(parambytes).hexdigest()
return f"param_hash:{array_hash}\n"
elif isinstance(op, (ElementwiseBinaryOperation,)):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do ElementwiseBinaryOperations really always have 2 initializers @iksnagreb? I don't see any error handling here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they can have up to two initializer inputs, though in practice that should probably not even happen

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up, I'll add handling for different cases here.

op.set_nodeattr("ipgen_path", str(ip_dir))
op.set_nodeattr("gen_top_module", new_module_name)

elif issubclass(type(op), HLSBackend):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity: Do HLS nodes really not need any (ugly) renaming of module names like the RTL nodes do? What if op.onnx_node.name is different in the current model than in the cached version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an issue. I noticed the RTL side because a layer was re-used during testing, but I didn't have an example that does this with an HLS layer yet. It might still work though, since I am not sure if later transformations care about the name of the IP core. I'll check this today.

from finn.transformation.fpgadataflow.specialize_layers import SpecializeLayers
from finn.util.basic import alveo_part_map
from finn.util.deps import get_cache_path
from tests.fpgadataflow.test_fpgadataflow_mvau import make_single_fclayer_modelwrapper

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import doesn't work, at least in the CI where FINN+ is launched as a pip-installed package.

@fpjentzsch
Copy link

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.

Is it possible we are running into race conditions with this caching system if multiple tests or builds or NodeLocalTransformations within a graph run simultaneously?

@fpjentzsch fpjentzsch marked this pull request as ready for review October 6, 2025 09:47
@bwintermann
Copy link
Author

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.

Is it possible we are running into race conditions with this caching system if multiple tests or builds or NodeLocalTransformations within a graph run simultaneously?

I ran into this before as well. It should not be a race condition as far as I am aware. I think the error originates from the fact that there simply is no tests module in the environment. In pyproject.toml we add tests to the finn package. Maybe changing it to finn.tests works. I'll look into it.

@LinusJungemann
Copy link
Member

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.
Is it possible we are running into race conditions with this caching system if multiple tests or builds or NodeLocalTransformations within a graph run simultaneously?

I ran into this before as well. It should not be a race condition as far as I am aware. I think the error originates from the fact that there simply is no tests module in the environment. In pyproject.toml we add tests to the finn package. Maybe changing it to finn.tests works. I'll look into it.

Yes, this is an issue I also ran into in #113. Using finn.tests fixed it for me.

if attributes is not None:
CACHE_IP_DEFINITIONS[op_cls]["use"] = attributes
else:
# List of fields that don't define the IP core itself,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be safe to add the following attributes to this list of default ignored attributes. Most of them are optional (set to "" or [] by default) attributes of HWCustomOp, but they currently still show up in the key file.

backend
preferred_impl_style
exec_mode
rtlsim_trace
slr
mem_port
partition_id
device_id
inFIFODepths
outFIFODepths
output_hook
io_chrc_in
io_chrc_out
io_chrc_period
io_chrc_pads_in
io_chrc_pads_out

# Prepare some always needed values
# FINN Commit
self.finn_commit = subprocess.run(
shlex.split("git rev-parse HEAD"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work in the CI, where FINN+ is installed as package and not from the repo. I.e., I get the following and the FINN+ commit does not contribute to the key, since there is no sanity check:

FINN Commit reads: (authored at: )
HLSLIB Commit reads: 5dde96382b84979c6caa6f34cdad2ac72fa28489 (authored at: 2025-07-14 11:35:11 +0200)

Can we use "poetry version" or some pip command to use the version string as key instead? It should be PEP 440 compliant, i.e., contain the latest short commit hash.

@fpjentzsch
Copy link

Currently, the caching does not save any time, because step_set_fifo_depths happens before step_ip_generation by default. This step needs to perform PrepareIP()/HLSSynthIP() if auto FIFO sizing is enabled, and currently does so even if it is disabled (see here, although this could be removed).

We likely need to split up step_set_fifo_depths or at least move it behind step_ip_generation again, let's discuss offline...

This issue has been previously discussed here:
#85
Xilinx#1185 (comment)

@fpjentzsch
Copy link

Currently test_split_large_fifos and test_fifosizing_linear fail (here), but I couldn't reproduce it locally yet, even when running multiple different test variants after each other.

When I ran the test suite a second time to test the effects of caching, I got 6 more failures than on the first run:

  • 1 more variant of test_split_large_fifos (now all variants affected)
  • 3 more variants of test_fifosizing_linear (now all variants affected)
  • test_end2end_build_dataflow_directory
  • the third cybersecurity notebook

Might be worth looking into these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants