You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm exploring the possibility of accelerating a basecalling model (based on the QuartzNet/Bonito architecture) using FINN. This model makes heavy use of Time-Channel Separable 1D convolutions (TCSConv1d), which are essentially 1D depthwise separable convolutions.
I understand that FINN requires a model's layers to be decomposed into supported primitives. Before attempting a significant model re-architecture and quantization-aware retraining, I wanted to ask for your guidance on the feasibility and the best approach for this type of layer.
A typical block in the model's encoder is structured as follows, with repeating sub-blocks containing the TCSConv1d layer:
Here is a snippet of the implementation:
import torch.nn as nn
from torch.nn import Module, ModuleList, Sequential, BatchNorm1d, Dropout
import brevitas.nn as qnn
class TCSConv1d_quant(Module):
"""
Quantized Time-Channel Separable 1D Convolution
"""
def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=False, separable=False, quant=8, quant_act=8):
super(TCSConv1d_quant, self).__init__()
self.separable = separable
if separable:
self.depthwise = qnn.QuantConv1d(
in_channels, in_channels, kernel_size=kernel_size, stride=stride,
padding=padding, dilation=dilation, bias=bias, groups=in_channels, # Corrected groups for depthwise
weight_bit_width=quant
)
self.pointwise = qnn.QuantConv1d(
in_channels, out_channels, kernel_size=1, stride=1,
dilation=dilation, bias=bias, padding=0,
weight_bit_width=quant
)
self.quant_identity = qnn.QuantIdentity(bit_width=quant_act, return_quant_tensor=True)
else:
self.conv = qnn.QuantConv1d(
in_channels, out_channels, kernel_size=kernel_size,
stride=stride, padding=padding, dilation=dilation, bias=bias,
weight_bit_width=quant
)
def forward(self, x):
if self.separable:
x = self.depthwise(x)
x = self.quant_identity(x)
x = self.pointwise(x)
else:
x = self.conv(x)
return x
class Block(Module):
"""
Quantized Block with TCSConv, BatchNorm, Activation
"""
def __init__(self, in_channels, out_channels, activation, repeat=5, kernel_size=1, stride=1, dilation=1, dropout=0.0, residual=False, separable=False, quant=8, quant_act=8):
super(Block, self).__init__()
self.use_res = residual
self.conv = ModuleList()
# ... logic to build repeating sub-blocks ...
# A sub-block looks like this:
# [
# TCSConv1d_quant(...),
# BatchNorm1d(...),
# qnn.QuantReLU(...),
# Dropout(...)
# ]
if self.use_res:
# The residual path also uses a TCSConv1d_quant and BatchNorm1d
self.residual = Sequential(...)
def forward(self, x):
_x = x
# Main path
for layer in self.conv:
_x = layer(_x)
# Residual connection
if self.use_res:
_x = _x + self.residual(x)
# Final activation
return self.activation(_x)
My main questions are:
Is the recommended approach to manually decompose each TCSConv1d layer into a standard grouped Conv1d followed by a 1x1 Conv1d within the PyTorch model before exporting to ONNX?
Does FINN have optimized hardware support for this specific pattern (1D depthwise separable convolution), or would it be treated as two independent standard convolutions?
Are there any known challenges or best practices for handling the residual connections that are also present in these blocks?
Thank you for your time and for the great work on this framework!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I'm exploring the possibility of accelerating a basecalling model (based on the QuartzNet/Bonito architecture) using FINN. This model makes heavy use of Time-Channel Separable 1D convolutions (TCSConv1d), which are essentially 1D depthwise separable convolutions.
I understand that FINN requires a model's layers to be decomposed into supported primitives. Before attempting a significant model re-architecture and quantization-aware retraining, I wanted to ask for your guidance on the feasibility and the best approach for this type of layer.
A typical block in the model's encoder is structured as follows, with repeating sub-blocks containing the TCSConv1d layer:
Here is a snippet of the implementation:
My main questions are:
Is the recommended approach to manually decompose each TCSConv1d layer into a standard grouped Conv1d followed by a 1x1 Conv1d within the PyTorch model before exporting to ONNX?
Does FINN have optimized hardware support for this specific pattern (1D depthwise separable convolution), or would it be treated as two independent standard convolutions?
Are there any known challenges or best practices for handling the residual connections that are also present in these blocks?
Thank you for your time and for the great work on this framework!
Beta Was this translation helpful? Give feedback.
All reactions