Skip to content

Web Documentation Cleanup #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions docs/configuration-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The full list of parameters can be found in the `Configuration Reference <config

You can find example values files in the `SuperSONIC GitHub repository <https://github.com/fastmachinelearning/SuperSONIC/tree/main/values>`_.

1. Select a Triton Inference Server version
1. Select a Triton Inference Server Version
=============================================

- Official versions can be found at `NVIDIA NGC <https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver>`_.
Expand Down Expand Up @@ -90,7 +90,7 @@ Triton version must be specified in the ``triton.image`` parameter in the values
<br><br>


3. Select resources for Triton pods
3. Select Resources for Triton Pods
=============================================

- You can configure CPU, memory, and GPU resources for Triton pods via the ``triton.resources`` parameter in the values file:
Expand Down Expand Up @@ -125,7 +125,7 @@ Triton version must be specified in the ``triton.image`` parameter in the values
- NVIDIA-L4


4. Configure Envoy Proxy
4. Configure Envoy Proxy
================================================

By default, Envoy proxy is enabled and configured to provide per-request
Expand Down Expand Up @@ -164,7 +164,7 @@ There are two options:
In this case, the client connections should be established to ``<load_balancer_url>:8001`` and NOT use SSL.


5. (optional) Configure rate limiting in Envoy Proxy
5. (Optional) Configure Rate Limiting in Envoy Proxy
======================================================

There are two types of rate limiting available in Envoy Proxy: *listener-level*, and *prometheus-based*.
Expand Down Expand Up @@ -202,7 +202,7 @@ There are two types of rate limiting available in Envoy Proxy: *listener-level*,

The metric and threshold for the Prometheus-based rate limiter are the same as those used for the autoscaler (see Prometheus Configuration).

6. (optional) Configure authentication in Envoy Proxy
6. (Optional) Configure Authentication in Envoy Proxy
======================================================

At the moment, the only supported authentication method is JWT. Example configuration for IceCube:
Expand All @@ -219,7 +219,7 @@ At the moment, the only supported authentication method is JWT. Example configur
port: 443


7. Deploy a Prometheus server or connect to an existing one
7. Deploy a Prometheus Server or Connect to an Existing One
============================================================

Prometheus is needed to scrape metrics for monitoring, as well as for the rate limiter and autoscaler.
Expand Down Expand Up @@ -272,7 +272,7 @@ Prometheus is needed to scrape metrics for monitoring, as well as for the rate l
port: <prometheus_port>


8. (optional) Configure metrics for scaling and rate limiting
8. (Optional) Configure Metrics for Scaling and Rate Limiting
===============================================================

Both the rate limiter and the autoscaler are currently configured to use the same Prometheus metric and threshold.
Expand All @@ -290,7 +290,7 @@ The Prometheus query for the graph is automatically inferred from the value of `
The graph also displays the threshold value defined in ``serverLoadThreshold`` parameter.


9. (optional) Deploy Grafana dashboard
9. (Optional) Deploy Grafana Dashboard
==========================================

Grafana is used to visualize metrics collected by Prometheus.
Expand Down Expand Up @@ -318,9 +318,9 @@ Grafana Ingress for web access, and datasources to connect to Prometheus,
.. figure:: img/grafana.png
:align: center
:height: 200
:alt: Supersonic Grafana dashboard
:alt: SuperSONIC Grafana Dashboard

10. (optional) Enable KEDA autoscaler
10. (Optional) Enable KEDA Autoscaler
==========================================

Autoscaling is implemented via `KEDA (Kubernetes Event-Driven Autoscaler) <https://keda.sh/>`_ and
Expand Down Expand Up @@ -353,7 +353,7 @@ Additional optional parameters can control how quickly the autoscaler reacts to
periodSeconds: 30
stepsize: 1

11. (optional) Configure Metrics Collector for running ``perf_analyzer``
11. (Optional) Configure Metrics Collector for Running ``perf_analyzer``
=========================================================================

To collect Prometheus metrics when using ``perf_analyzer`` for testing,
Expand Down Expand Up @@ -384,7 +384,7 @@ Running with ``perf_analyzer`` is then done with:
If ingress is not desired, port-forward the metrics collector service and call
``--metrics-url localhost:8003/metrics`` to access the metrics.

12. (optional) Configure advanced monitoring
12. (Optional) Configure Advanced Monitoring
=============================================

Refer to the `advanced monitoring guide <advanced-monitoring>`_.
2 changes: 1 addition & 1 deletion docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Installation
This value will be used as a prefix for all resources created by the chart,
unless ``nameOverride`` is specified in the values file.

1. Successfully executed ``helm install`` command will print a link to auto-generated Grafana dashboard
Successfully executed ``helm install`` command will print a link to auto-generated Grafana dashboard
and other useful information.

.. figure:: img/grafana.png
Expand Down
22 changes: 11 additions & 11 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,17 @@ SuperSONIC GitHub repository: `fastmachinelearning/SuperSONIC <https://github.co

-----

Why "inference-as-a-service"?
Why Inference-as-a-Service?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. container:: twocol

.. container:: leftside

The computing demands of modern scientific experiments are growing at a faster rate than the performance improvements
of traditional processors (CPUs). This trend is driven by increasing data collection rates, tightening latency requirements,
of traditional general-purpose processors (CPUs). This trend is driven by increasing data collection rates, tightening latency requirements,
and rising complexity of algorithms, particularly those based on machine learning.
Such a computing landscape strongly motivates the adoption of specialized coprocessors, such as FPGAs, GPUs, and TPUs.
Such a computing landscape strongly motivates the adoption of specialized coprocessors, such as FPGAs, GPUs, and TPUs. However, this introduces new resource allocation and scaling issues.

.. container:: rightside

Expand All @@ -41,8 +41,8 @@ Why "inference-as-a-service"?
`Image source: A3D3 <https://a3d3.ai/about/>`_


In "inference-as-a-service" model, the data processing workflows ("clients") off-load computationally intensive steps,
such as neural network inference, to a remote "server" equipped with coprocessors. This design allows to optimize both
In the inference-as-a-service model, the data processing workflows ("clients") off-load computationally intensive steps,
such as neural network inference, to a remote "server" equipped with coprocessors. This design allows for optimization of both
data processing throughput and coprocessor utilization by dynamically balancing the ratio of CPUs to coprocessors.
Numerous R&D efforts implementing this paradigm in HEP and MMA experiments are grouped under the name
**SONIC (Services for Optimized Network Inference on Coprocessors)**.
Expand All @@ -54,16 +54,16 @@ Numerous R&D efforts implementing this paradigm in HEP and MMA experiments are g

-----

SuperSONIC: a case for shared server infrastructure
SuperSONIC: A Case for Shared Server Infrastructure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A key feature of the SONIC approach is the decoupling of clients from servers and the standardization
Two of the key features of the SONIC approach are the decoupling of clients from servers and the standardization
of communication between them.
While client-side implementations may vary across applications, the server-side infrastructure can remain
largely the same, since the server functionality requirements (load balancing, autoscaling, etc.) are not
experiment-specific.

The purpose of SuperSONIC project is to develop server infrastructure that could be reused by scientific
The purpose of the SuperSONIC project is to develop server infrastructure that could be reused by multiple scientific
experiments with only small differences in configuration.

-----
Expand All @@ -81,7 +81,7 @@ We are open for collaboration and encourage other experiments to try SuperSONIC
<td style="width:65%; vertical-align: center; padding-right: 1em;">
<p><a href="https://home.cern/science/experiments/cms">CMS Experiment</a> at the Large Hadron Collider (CERN).</p>
<p>
CMS is testing inference-as-a-service approach in Run 3 offline processing workflows, off-loading inferences to GPUs for
CMS is testing the inference-as-a-service approach in Run 3 offline processing workflows, off-loading inferences to GPUs for
machine learning models such as <strong>ParticleNet</strong>, <strong>DeepMET</strong>, <strong>DeepTau</strong>, <strong>ParT</strong>.
In addition, non-ML tracking algorithms such as <strong>LST</strong> and <strong>Patatrack</strong> are being adapted for deployment
as-a-service.
Expand Down Expand Up @@ -120,7 +120,7 @@ We are open for collaboration and encourage other experiments to try SuperSONIC
<td style="width:65%; vertical-align: center; padding-right: 1em;">
<p><a href="https://icecube.wisc.edu/">IceCube Neutrino Observatory</a> at the South Pole.</p>
<p>
IceCube uses SONIC approach to accelerate event classifier algorithms based on convolutional neural networks (CNNs).
IceCube uses the SONIC approach to accelerate event classifier algorithms based on convolutional neural networks (CNNs).
</p>
</td>
<td style="width:35%; vertical-align: center;">
Expand All @@ -129,7 +129,7 @@ We are open for collaboration and encourage other experiments to try SuperSONIC
</tr>
</table>

Deployment sites
Deployment Sites
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SuperSONIC has been successfully tested at the computing clusters listed below.
Expand Down
Loading