Skip to content

Commit f0978b2

Browse files
authored
Docs: regen notebooks and docs (#1285)
1 parent de137b9 commit f0978b2

20 files changed

+3072
-806
lines changed

docs/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/tutorials/anatomy_quantizer.html

Lines changed: 38 additions & 34 deletions
Large diffs are not rendered by default.

docs/tutorials/anatomy_quantizer.ipynb

Lines changed: 304 additions & 51 deletions
Large diffs are not rendered by default.

docs/tutorials/onnx_export.html

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@ <h3>Basic Example<a class="headerlink" href="#Basic-Example" title="Permalink to
564564
<h3>Complete Model<a class="headerlink" href="#Complete-Model" title="Permalink to this heading">#</a></h3>
565565
<p>A similar approach can be used with entire Pytorch models, rather than single layer.</p>
566566
<div class="nbinput nblast docutils container">
567-
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
567+
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[4]:
568568
</pre></div>
569569
</div>
570570
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">QuantModel</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
@@ -628,7 +628,7 @@ <h3>The C in QCDQ (Bitwidth &lt;= 8)<a class="headerlink" href="#The-C-in-QCDQ-(
628628
<p>In Brevitas however, if a quantized layer with bit-width &lt;= 8 is exported, the Clip node will be automatically inserted, with the min/max values computed based on the particular type of quantized performed (i.e., signed vs unsigned, narrow range vs no narrow range, etc.).</p>
629629
<p>Even though the Tensor data type will still be a Int8 or UInt8, its values are restricted to the desired bit-width.</p>
630630
<div class="nbinput nblast docutils container">
631-
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[7]:
631+
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
632632
</pre></div>
633633
</div>
634634
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Model</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">Module</span><span class="p">):</span>
@@ -698,7 +698,7 @@ <h2>ONNX Runtime<a class="headerlink" href="#ONNX-Runtime" title="Permalink to t
698698
<h3>QCDQ<a class="headerlink" href="#QCDQ" title="Permalink to this heading">#</a></h3>
699699
<p>Since for QCDQ we are only using standard ONNX operation, it is possible to run the exported model using ONNX Runtime.</p>
700700
<div class="nbinput docutils container">
701-
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[11]:
701+
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[6]:
702702
</pre></div>
703703
</div>
704704
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">onnxruntime</span> <span class="k">as</span> <span class="nn">ort</span>
@@ -747,9 +747,7 @@ <h3>QCDQ<a class="headerlink" href="#QCDQ" title="Permalink to this heading">#</
747747
</div>
748748
<div class="output_area stderr docutils container">
749749
<div class="highlight"><pre>
750-
2024-09-12 12:18:03.405472924 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
751-
/proj/xlabs/users/nfraser/opt/miniforge3/envs/20231115_brv_pt1.13.1/lib/python3.10/site-packages/brevitas/nn/quant_linear.py:69: UserWarning: Defining your `__torch_function__` as a plain method is deprecated and will be an error in future, please define it as a classmethod. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/torch/csrc/utils/python_arg_parser.cpp:350.)
752-
output_tensor = linear(x, quant_weight, quant_bias)
750+
2025-05-09 15:50:06.990808285 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
753751
</pre></div></div>
754752
</div>
755753
<section id="QGEMM-vs-GEMM">
@@ -759,7 +757,7 @@ <h4>QGEMM vs GEMM<a class="headerlink" href="#QGEMM-vs-GEMM" title="Permalink to
759757
<p>We did not observe a similar behavior for other operations such as <code class="docutils literal notranslate"><span class="pre">QuantConvNd</span></code>.</p>
760758
<p>An example of a layer that will match this definition is the following:</p>
761759
<div class="nbinput nblast docutils container">
762-
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[12]:
760+
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[7]:
763761
</pre></div>
764762
</div>
765763
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">brevitas.quant.scaled_int</span> <span class="kn">import</span> <span class="n">Int32Bias</span>
@@ -782,7 +780,7 @@ <h3>Export Dynamically Quantized Models to ONNX<a class="headerlink" href="#Expo
782780
<p>You can also export dynamically quantized models to ONNX, but there are some limitations. The ONNX DynamicQuantizeLinear requires the following settings: - Asymmetric quantization (and therefore <em>unsigned</em>) - Min-max scaling - Rounding to nearest - Per tensor scaling - Bit width set to 8</p>
783781
<p>This is shown in the following example:</p>
784782
<div class="nbinput nblast docutils container">
785-
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[14]:
783+
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[8]:
786784
</pre></div>
787785
</div>
788786
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">brevitas_examples.common.generative.quantizers</span> <span class="kn">import</span> <span class="n">ShiftedUint8DynamicActPerTensorFloat</span>
@@ -813,7 +811,7 @@ <h3>Export Dynamically Quantized Models to ONNX<a class="headerlink" href="#Expo
813811
</div>
814812
</div>
815813
<div class="nbinput docutils container">
816-
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[15]:
814+
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[9]:
817815
</pre></div>
818816
</div>
819817
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">show_netron</span><span class="p">(</span><span class="s2">&quot;dynamic_quant_model_qcdq.onnx&quot;</span><span class="p">,</span> <span class="mi">8086</span><span class="p">)</span>
@@ -829,7 +827,7 @@ <h3>Export Dynamically Quantized Models to ONNX<a class="headerlink" href="#Expo
829827
</pre></div></div>
830828
</div>
831829
<div class="nboutput nblast docutils container">
832-
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[15]:
830+
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[9]:
833831
</pre></div>
834832
</div>
835833
<div class="output_area rendered_html docutils container">

docs/tutorials/onnx_export.ipynb

Lines changed: 69 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,14 @@
2626
{
2727
"cell_type": "code",
2828
"execution_count": 1,
29-
"metadata": {},
29+
"metadata": {
30+
"execution": {
31+
"iopub.execute_input": "2025-05-09T14:49:50.018639Z",
32+
"iopub.status.busy": "2025-05-09T14:49:50.017028Z",
33+
"iopub.status.idle": "2025-05-09T14:49:57.814054Z",
34+
"shell.execute_reply": "2025-05-09T14:49:57.811032Z"
35+
}
36+
},
3037
"outputs": [
3138
{
3239
"name": "stdout",
@@ -119,6 +126,12 @@
119126
"execution_count": 2,
120127
"metadata": {
121128
"collapsed": false,
129+
"execution": {
130+
"iopub.execute_input": "2025-05-09T14:49:57.828448Z",
131+
"iopub.status.busy": "2025-05-09T14:49:57.827713Z",
132+
"iopub.status.idle": "2025-05-09T14:49:57.855085Z",
133+
"shell.execute_reply": "2025-05-09T14:49:57.852844Z"
134+
},
122135
"jupyter": {
123136
"outputs_hidden": false
124137
},
@@ -148,6 +161,12 @@
148161
"execution_count": 3,
149162
"metadata": {
150163
"collapsed": false,
164+
"execution": {
165+
"iopub.execute_input": "2025-05-09T14:49:57.863763Z",
166+
"iopub.status.busy": "2025-05-09T14:49:57.863068Z",
167+
"iopub.status.idle": "2025-05-09T14:50:06.008930Z",
168+
"shell.execute_reply": "2025-05-09T14:50:06.006760Z"
169+
},
151170
"jupyter": {
152171
"outputs_hidden": false
153172
},
@@ -263,9 +282,15 @@
263282
},
264283
{
265284
"cell_type": "code",
266-
"execution_count": 5,
285+
"execution_count": 4,
267286
"metadata": {
268287
"collapsed": false,
288+
"execution": {
289+
"iopub.execute_input": "2025-05-09T14:50:06.018006Z",
290+
"iopub.status.busy": "2025-05-09T14:50:06.017273Z",
291+
"iopub.status.idle": "2025-05-09T14:50:06.136026Z",
292+
"shell.execute_reply": "2025-05-09T14:50:06.134548Z"
293+
},
269294
"jupyter": {
270295
"outputs_hidden": false
271296
},
@@ -390,9 +415,15 @@
390415
},
391416
{
392417
"cell_type": "code",
393-
"execution_count": 7,
418+
"execution_count": 5,
394419
"metadata": {
395420
"collapsed": false,
421+
"execution": {
422+
"iopub.execute_input": "2025-05-09T14:50:06.143034Z",
423+
"iopub.status.busy": "2025-05-09T14:50:06.142638Z",
424+
"iopub.status.idle": "2025-05-09T14:50:06.274118Z",
425+
"shell.execute_reply": "2025-05-09T14:50:06.272709Z"
426+
},
396427
"jupyter": {
397428
"outputs_hidden": false
398429
},
@@ -533,9 +564,15 @@
533564
},
534565
{
535566
"cell_type": "code",
536-
"execution_count": 11,
567+
"execution_count": 6,
537568
"metadata": {
538569
"collapsed": false,
570+
"execution": {
571+
"iopub.execute_input": "2025-05-09T14:50:06.283663Z",
572+
"iopub.status.busy": "2025-05-09T14:50:06.283199Z",
573+
"iopub.status.idle": "2025-05-09T14:50:07.086039Z",
574+
"shell.execute_reply": "2025-05-09T14:50:07.084014Z"
575+
},
539576
"jupyter": {
540577
"outputs_hidden": false
541578
},
@@ -555,9 +592,7 @@
555592
"name": "stderr",
556593
"output_type": "stream",
557594
"text": [
558-
"2024-09-12 12:18:03.405472924 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.\n",
559-
"/proj/xlabs/users/nfraser/opt/miniforge3/envs/20231115_brv_pt1.13.1/lib/python3.10/site-packages/brevitas/nn/quant_linear.py:69: UserWarning: Defining your `__torch_function__` as a plain method is deprecated and will be an error in future, please define it as a classmethod. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525541990/work/torch/csrc/utils/python_arg_parser.cpp:350.)\n",
560-
" output_tensor = linear(x, quant_weight, quant_bias)\n"
595+
"2025-05-09 15:50:06.990808285 [W:onnxruntime:, graph.cc:1283 Graph] Initializer linear.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.\n"
561596
]
562597
}
563598
],
@@ -626,9 +661,15 @@
626661
},
627662
{
628663
"cell_type": "code",
629-
"execution_count": 12,
664+
"execution_count": 7,
630665
"metadata": {
631666
"collapsed": false,
667+
"execution": {
668+
"iopub.execute_input": "2025-05-09T14:50:07.094253Z",
669+
"iopub.status.busy": "2025-05-09T14:50:07.092574Z",
670+
"iopub.status.idle": "2025-05-09T14:50:07.134212Z",
671+
"shell.execute_reply": "2025-05-09T14:50:07.132360Z"
672+
},
632673
"jupyter": {
633674
"outputs_hidden": false
634675
},
@@ -687,8 +728,15 @@
687728
},
688729
{
689730
"cell_type": "code",
690-
"execution_count": 14,
691-
"metadata": {},
731+
"execution_count": 8,
732+
"metadata": {
733+
"execution": {
734+
"iopub.execute_input": "2025-05-09T14:50:07.141447Z",
735+
"iopub.status.busy": "2025-05-09T14:50:07.140768Z",
736+
"iopub.status.idle": "2025-05-09T14:50:07.913782Z",
737+
"shell.execute_reply": "2025-05-09T14:50:07.911672Z"
738+
}
739+
},
692740
"outputs": [],
693741
"source": [
694742
"from brevitas_examples.common.generative.quantizers import ShiftedUint8DynamicActPerTensorFloat\n",
@@ -719,8 +767,15 @@
719767
},
720768
{
721769
"cell_type": "code",
722-
"execution_count": 15,
723-
"metadata": {},
770+
"execution_count": 9,
771+
"metadata": {
772+
"execution": {
773+
"iopub.execute_input": "2025-05-09T14:50:07.921255Z",
774+
"iopub.status.busy": "2025-05-09T14:50:07.920534Z",
775+
"iopub.status.idle": "2025-05-09T14:50:10.952025Z",
776+
"shell.execute_reply": "2025-05-09T14:50:10.949423Z"
777+
}
778+
},
724779
"outputs": [
725780
{
726781
"name": "stdout",
@@ -744,10 +799,10 @@
744799
" "
745800
],
746801
"text/plain": [
747-
"<IPython.lib.display.IFrame at 0x7fb62856ccd0>"
802+
"<IPython.lib.display.IFrame at 0x7f4a5336ee00>"
748803
]
749804
},
750-
"execution_count": 15,
805+
"execution_count": 9,
751806
"metadata": {},
752807
"output_type": "execute_result"
753808
}

0 commit comments

Comments
 (0)