Xilinx
diff --git a/‎docs/searchindex.js
Lines changed: 1 addition & 1 deletion b/‎docs/searchindex.js
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/tutorials/anatomy_quantizer.html
Lines changed: 38 additions & 34 deletions b/‎docs/tutorials/anatomy_quantizer.html
Lines changed: 38 additions & 34 deletions
diff --git a/‎docs/tutorials/anatomy_quantizer.ipynb
Lines changed: 304 additions & 51 deletions b/‎docs/tutorials/anatomy_quantizer.ipynb
Lines changed: 304 additions & 51 deletions
diff --git a/‎docs/tutorials/onnx_export.html
Lines changed: 8 additions & 10 deletions b/‎docs/tutorials/onnx_export.html
Lines changed: 8 additions & 10 deletions
diff --git a/‎docs/tutorials/onnx_export.ipynb
Lines changed: 69 additions & 14 deletions b/‎docs/tutorials/onnx_export.ipynb
Lines changed: 69 additions & 14 deletions
@@ -564,7 +564,7 @@ <h3>Basic Example<a class="headerlink" href="#Basic-Example" title="Permalink to
 <h3>Complete Model<a class="headerlink" href="#Complete-Model" title="Permalink to this heading">#</a></h3>
 <p>A similar approach can be used with entire Pytorch models, rather than single layer.</p>
 <div class="nbinput nblast docutils container">
-<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
+<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[4]:
 </pre></div>
 </div>
 <div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">QuantModel</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
@@ -628,7 +628,7 @@ <h3>The C in QCDQ (Bitwidth &lt;= 8)<a class="headerlink" href="#The-C-in-QCDQ-(
 <p>In Brevitas however, if a quantized layer with bit-width &lt;= 8 is exported, the Clip node will be automatically inserted, with the min/max values computed based on the particular type of quantized performed (i.e., signed vs unsigned, narrow range vs no narrow range, etc.).</p>
 <p>Even though the Tensor data type will still be a Int8 or UInt8, its values are restricted to the desired bit-width.</p>
 <div class="nbinput nblast docutils container">
-<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[7]:
+<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
 </pre></div>
 </div>
 <div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Model</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
@@ -698,7 +698,7 @@ <h2>ONNX Runtime<a class="headerlink" href="#ONNX-Runtime" title="Permalink to t
 <h3>QCDQ<a class="headerlink" href="#QCDQ" title="Permalink to this heading">#</a></h3>
 <p>Since for QCDQ we are only using standard ONNX operation, it is possible to run the exported model using ONNX Runtime.</p>
 <div class="nbinput docutils container">
-<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[11]:
+<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[6]:
 </pre></div>
 </div>
 <div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">onnxruntime</span> <span class="k">as</span> <span class="nn">ort</span>
@@ -747,9 +747,7 @@ <h3>QCDQ<a class="headerlink" href="#QCDQ" title="Permalink to this heading">#</
 </div>
 <div class="output_area stderr docutils container">
 <div class="highlight"><pre>
-2024-09-12 12:18:03.405472924 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
-/proj/xlabs/users/nfraser/opt/miniforge3/envs/20231115_brv_pt1.13.1/lib/python3.10/site-packages/brevitas/nn/quant_linear.py:69: UserWarning: Defining your `__torch_function__` as a plain method is deprecated and will be an error in future, please define it as a classmethod. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/torch/csrc/utils/python_arg_parser.cpp:350.)
-  output_tensor = linear(x, quant_weight, quant_bias)
+2025-05-09 15:50:06.990808285 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
 </pre></div></div>
 </div>
 <section id="QGEMM-vs-GEMM">
@@ -759,7 +757,7 @@ <h4>QGEMM vs GEMM<a class="headerlink" href="#QGEMM-vs-GEMM" title="Permalink to
 <p>We did not observe a similar behavior for other operations such as <code class="docutils literal notranslate"><span class="pre">QuantConvNd</span></code>.</p>
 <p>An example of a layer that will match this definition is the following:</p>
 <div class="nbinput nblast docutils container">
-<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[12]:
+<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[7]:
 </pre></div>
 </div>
 <div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">brevitas.quant.scaled_int</span> <span class="kn">import</span> <span class="n">Int32Bias</span>
@@ -782,7 +780,7 @@ <h3>Export Dynamically Quantized Models to ONNX<a class="headerlink" href="#Expo
 <p>You can also export dynamically quantized models to ONNX, but there are some limitations. The ONNX DynamicQuantizeLinear requires the following settings: - Asymmetric quantization (and therefore <em>unsigned</em>) - Min-max scaling - Rounding to nearest - Per tensor scaling - Bit width set to 8</p>
 <p>This is shown in the following example:</p>
 <div class="nbinput nblast docutils container">
-<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[14]:
+<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[8]:
 </pre></div>
 </div>
 <div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">brevitas_examples.common.generative.quantizers</span> <span class="kn">import</span> <span class="n">ShiftedUint8DynamicActPerTensorFloat</span>
@@ -813,7 +811,7 @@ <h3>Export Dynamically Quantized Models to ONNX<a class="headerlink" href="#Expo
 </div>
 </div>
 <div class="nbinput docutils container">
-<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[15]:
+<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[9]:
 </pre></div>
 </div>
 <div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">show_netron</span><span class="p">(</span><span class="s2">&quot;dynamic_quant_model_qcdq.onnx&quot;</span><span class="p">,</span> <span class="mi">8086</span><span class="p">)</span>
@@ -829,7 +827,7 @@ <h3>Export Dynamically Quantized Models to ONNX<a class="headerlink" href="#Expo
 </pre></div></div>
 </div>
 <div class="nboutput nblast docutils container">
-<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[15]:
+<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[9]:
 </pre></div>
 </div>
 <div class="output_area rendered_html docutils container">
 
@@ -26,7 +26,14 @@
   {
    "cell_type": "code",
    "execution_count": 1,
-   "metadata": {},
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:49:50.018639Z",
+     "iopub.status.busy": "2025-05-09T14:49:50.017028Z",
+     "iopub.status.idle": "2025-05-09T14:49:57.814054Z",
+     "shell.execute_reply": "2025-05-09T14:49:57.811032Z"
+    }
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -119,6 +126,12 @@
    "execution_count": 2,
    "metadata": {
     "collapsed": false,
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:49:57.828448Z",
+     "iopub.status.busy": "2025-05-09T14:49:57.827713Z",
+     "iopub.status.idle": "2025-05-09T14:49:57.855085Z",
+     "shell.execute_reply": "2025-05-09T14:49:57.852844Z"
+    },
     "jupyter": {
      "outputs_hidden": false
     },
@@ -148,6 +161,12 @@
    "execution_count": 3,
    "metadata": {
     "collapsed": false,
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:49:57.863763Z",
+     "iopub.status.busy": "2025-05-09T14:49:57.863068Z",
+     "iopub.status.idle": "2025-05-09T14:50:06.008930Z",
+     "shell.execute_reply": "2025-05-09T14:50:06.006760Z"
+    },
     "jupyter": {
      "outputs_hidden": false
     },
@@ -263,9 +282,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {
     "collapsed": false,
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:50:06.018006Z",
+     "iopub.status.busy": "2025-05-09T14:50:06.017273Z",
+     "iopub.status.idle": "2025-05-09T14:50:06.136026Z",
+     "shell.execute_reply": "2025-05-09T14:50:06.134548Z"
+    },
     "jupyter": {
      "outputs_hidden": false
     },
@@ -390,9 +415,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 5,
    "metadata": {
     "collapsed": false,
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:50:06.143034Z",
+     "iopub.status.busy": "2025-05-09T14:50:06.142638Z",
+     "iopub.status.idle": "2025-05-09T14:50:06.274118Z",
+     "shell.execute_reply": "2025-05-09T14:50:06.272709Z"
+    },
     "jupyter": {
      "outputs_hidden": false
     },
@@ -533,9 +564,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 6,
    "metadata": {
     "collapsed": false,
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:50:06.283663Z",
+     "iopub.status.busy": "2025-05-09T14:50:06.283199Z",
+     "iopub.status.idle": "2025-05-09T14:50:07.086039Z",
+     "shell.execute_reply": "2025-05-09T14:50:07.084014Z"
+    },
     "jupyter": {
      "outputs_hidden": false
     },
@@ -555,9 +592,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "2024-09-12 12:18:03.405472924 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.\n",
-      "/proj/xlabs/users/nfraser/opt/miniforge3/envs/20231115_brv_pt1.13.1/lib/python3.10/site-packages/brevitas/nn/quant_linear.py:69: UserWarning: Defining your `__torch_function__` as a plain method is deprecated and will be an error in future, please define it as a classmethod. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/torch/csrc/utils/python_arg_parser.cpp:350.)\n",
-      "  output_tensor = linear(x, quant_weight, quant_bias)\n"
+      "2025-05-09 15:50:06.990808285 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.\n"
      ]
     }
    ],
@@ -626,9 +661,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 7,
    "metadata": {
     "collapsed": false,
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:50:07.094253Z",
+     "iopub.status.busy": "2025-05-09T14:50:07.092574Z",
+     "iopub.status.idle": "2025-05-09T14:50:07.134212Z",
+     "shell.execute_reply": "2025-05-09T14:50:07.132360Z"
+    },
     "jupyter": {
      "outputs_hidden": false
     },
@@ -687,8 +728,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
+   "execution_count": 8,
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:50:07.141447Z",
+     "iopub.status.busy": "2025-05-09T14:50:07.140768Z",
+     "iopub.status.idle": "2025-05-09T14:50:07.913782Z",
+     "shell.execute_reply": "2025-05-09T14:50:07.911672Z"
+    }
+   },
    "outputs": [],
    "source": [
     "from brevitas_examples.common.generative.quantizers import ShiftedUint8DynamicActPerTensorFloat\n",
@@ -719,8 +767,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
+   "execution_count": 9,
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-05-09T14:50:07.921255Z",
+     "iopub.status.busy": "2025-05-09T14:50:07.920534Z",
+     "iopub.status.idle": "2025-05-09T14:50:10.952025Z",
+     "shell.execute_reply": "2025-05-09T14:50:10.949423Z"
+    }
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -744,10 +799,10 @@
        "        "
       ],
       "text/plain": [
-       "<IPython.lib.display.IFrame at 0x7fb62856ccd0>"
+       "<IPython.lib.display.IFrame at 0x7f4a5336ee00>"
       ]
      },
-     "execution_count": 15,
+     "execution_count": 9,
      "metadata": {},
      "output_type": "execute_result"
     }