Considering Hardware Implementation of Activation Tensor Slicing/Selection in FINN? #1322

eveneha · 2025-04-21T13:57:48Z

eveneha
Apr 21, 2025

Hi FINN developers and community,

I'm working on accelerating a Temporal Convolutional Network (TCN) model using FINN for deployment on a Pynq-Z1. The TCN architecture involves multiple blocks, and due to the nature of 'valid' convolutions, the temporal dimension shrinks through the network.

Goal:

After the final TCN block, I have an activation tensor (e.g., shape [N, C, 168, 1]). To reduce downstream computations (specifically before the final fully connected layer equivalent), I only need a specific segment of this tensor along the temporal dimension (e.g., the central element, resulting in [N, C, 1, 1]). My goal is to implement this selection operation within the FPGA hardware dataflow generated by FINN for maximum efficiency.

Tracing this backwards this would potentially reduce a lot of computations for me by introducing slice nodes which only keeps the relevant indices for the final 1 output of the network:

Input:
Input size [1,1000] --> Input size [1,665]

Block1:
LayerName [input_tensor_size] --> [potentially_sliced_tensor_size]
Conv1 [1,1,1000,1] --> [1,1,665,1]
BatchNorm1 [1,4,496,1] --> [1,4,329,1]
Conv2 [1,4,496,1] --> [1,4,329,1]
SLICE here could reduce 329 to 81
BatchNorm2 [1,4,488,1] --> [1,4,81,1]
ReLU [1,4,488,1] --> [1,4,81,1]

Block2:
Conv1 [1,4,488,1] --> [1,4,81,1]
BatchNorm1 [1,8,456,1] --> [1,8,73,1]
Conv2 [1,8,456,1] --> [1,8,73,1]
SLICE here could reduce 73 to 17
BatchNorm2 [1,8,424,1] --> [1,8,17,1]
ReLU [1,8,424,1] --> [1,8,17,1]

Block3:
Conv1 [1,8,424,1] --> [1,8,17,1]
BatchNorm1 [1,16,296,1] --> [1,16,9,1]
Conv2 [1,16,296,1] --> [1,16,9,1]
SLICE here could reduce 9 to 1
BatchNorm2 [1,16,168,1] --> [1,16,1,1]
ReLU [1,16,168,1] --> [1,16,1,1]

For reference I already know which indexes to keep in each layer for each tensor.
The issue I face is just the missing functionality for SLICE operation in FINN. So I have spent a few days trying different workarounds.

Attempts and Failures:

Normal slicing in pytorch between layers:
- There is no conversion for slice in convert_to_hw_layers.
Pruning by modifying the PruneChannels from qonnx/fastmachinelearning to PruneSamples
- Worked for a bit but inferShapes from qonnx.transformations.infer_shaps does not accept output changes over convolutional nodes.

Potential paths & Questions:

Based on discussions and my understanding of FINN, I only see one potential path forward:

Make a Custom HLS Streaming Slice or Gather Layer:

Method: Develop a new custom HLS layer specifically for streaming slice operations on activations. This would involve writing HLS code for an AXI-Stream component that selectively passes through input data based on indices or a pattern, defining a new FINN custom op (StreamingSlice), and creating a FINN transformation pass to replace the ONNX Slice/Gather with this custom op.

Pros: Could be highly efficient at runtime (cycle-accurate selection, only processing desired data). Explicit control.

Cons/Question: Significant development effort required (HLS, AXI-Stream, FINN internals). Is this overkill? Are there existing examples or utilities within FINN that might simplify this process if deemed necessary?

Given these options:

What is the recommended approach within the FINN ecosystem for implementing this kind of activation tensor selection/slicing in hardware?

Are there any other FINN transformations or techniques I might have missed that could handle this?

Are there specific pitfalls or pointers the community could share regarding custom HLS layer development for this type of streaming data manipulation?

Thanks in advance for any insights or guidance!

fpjentzsch · 2025-04-24T07:32:37Z

fpjentzsch
Apr 24, 2025
Collaborator

Hi,

you are on the right track. If you need it quickly, implementing a new HLS custom op is probably the best option.

My colleague Christoph (@iksnagreb) also plans to implement something like this in a more generic way, maybe he can chime in.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Considering Hardware Implementation of Activation Tensor Slicing/Selection in FINN? #1322

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Considering Hardware Implementation of Activation Tensor Slicing/Selection in FINN? #1322

Uh oh!

Uh oh!

eveneha Apr 21, 2025

Goal:

Attempts and Failures:

Potential paths & Questions:

Replies: 1 comment

Uh oh!

fpjentzsch Apr 24, 2025 Collaborator

eveneha
Apr 21, 2025

fpjentzsch
Apr 24, 2025
Collaborator