fix docs related to cudss (#334)

bodono · web-flow · commit 7249b2d2dc95 · 2025-11-18T22:06:31.000Z
diff --git a/docs/src/api/python.rst b/docs/src/api/python.rst
@@ -45,9 +45,10 @@ of the proper format, SCS will attempt to convert them. The
 :code:`use_indirect` setting switches between the sparse direct
 :ref:`linear_solver` (the default) or the sparse indirect solver. If the MKL
 Pardiso direct solver for SCS is :ref:`installed <python_install>` then it can
-be used by setting :code:`mkl=True`. If the GPU indirect solver for SCS is
-:ref:`installed <python_install>` and a GPU is available then it can be used by
-setting :code:`gpu=True`.  The remaining fields are explained in
+be used by setting :code:`mkl=True`. If a GPU solver for SCS is :ref:`installed
+<python_install>` and a GPU is available then it can be used by setting
+:code:`gpu=True`. For the direct GPU solver based on cuDSS set
+:code:`use_indirect=False`. The remaining fields are explained in
 :ref:`settings`.
 
 Then to solve the problem call:
diff --git a/docs/src/install/c.rst b/docs/src/install/c.rst
@@ -56,6 +56,9 @@ The CMake build-system exports two CMake targets called :code:`scs::scsdir` and
 :code:`scs::scsindir` as well as a header file :code:`scs.h` that defines the
 API.
 
+MKL
+"""
+
 If `MKL
 <https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html>`_
 is installed in your system and the :code:`MKLROOT` environment variable is
@@ -66,6 +69,9 @@ MKL compiler flags might not be right for your system and may need to be
 <https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html>`_).
 
 
+GPU
+"""
+
 If you have a GPU and CUDA toolkit installed, along with the
 `cuDSS <https://developer.nvidia.com/cudss>`_ library, you can compile SCS
 with cuDSS support using CMake. First, ensure that the :code:`CUDA_PATH` and
@@ -78,8 +84,11 @@ with cuDSS support using CMake. First, ensure that the :code:`CUDA_PATH` and
 
 Currently cuDSS only supports 32 bit integers (for sparse matrix idicies) so
 :code:`DDLONG=OFF` is mandatory.
-This will build and install the cuDSS linear solver with target :code:`scs::scscudss`.
+This will build and install the cuDSS linear solver with target
+:code:`scs::scscudss`.
 
+Importing
+"""""""""
 The libraries can be imported using the find_package CMake command and used
 by calling target_link_libraries as in the following example:
 
@@ -154,10 +163,10 @@ binaries in the out folder corresponding to the GPU version.  Note that the GPU
   make gpu DLONG=0
   out/run_tests_gpu_indirect
 
-Finally, to compile and test the :ref:`cuDSS solver <cudss>` you need to have
-CUDA toolkit, the :code:`nvcc` compiler, and
-`cuDSS <https://developer.nvidia.com/cudss>`_ library installed.
-Then set :code:`CUDA_PATH` and :code:`CUDSS_PATH` and execute
+Finally, to compile and test the :ref:`cuDSS solver <cudss_solver>` you need to
+have CUDA toolkit, the :code:`nvcc` compiler, and `cuDSS
+<https://developer.nvidia.com/cudss>`_ library installed.  Then set
+:code:`CUDA_PATH` and :code:`CUDSS_PATH` and execute
 
 .. code:: bash
 
diff --git a/docs/src/install/python.rst b/docs/src/install/python.rst
@@ -15,16 +15,38 @@ You can also install directly from source
 
   git clone --recursive https://github.com/bodono/scs-python.git
   cd scs-python
-  python -m pip install --verbose .
+  python -m pip install .
+
+MKL
+"""
 
 If you have MKL, you can install the MKL Pardiso interface using
 
 .. code:: bash
 
-  python -m pip install --verbose -Csetup-args=-Dlink_mkl=true .
+  python -m pip install -Csetup-args=-Dlink_mkl=true .
+
+See :ref:`here <python_interface>` for how to enable MKL when solving. MKL is
+typically faster than the built-in linear system solver.
+
+GPU
+"""
+
+If you have a GPU and cuDSS installed you can install the GPU direct sparse
+solver using
+
+.. code:: bash
+
+  python -m pip install -Csetup-args=-Dlink_cudss=true -Csetup-args=-Dint32=true .
+
+See :ref:`here <python_interface>` for how to enable the GPU when solving. The
+sparse direct GPU solver is typically very fast.
+
+See `here <https://colab.research.google.com/drive/1POCgDNFg8fycHMI9T9N6V3iHFhXRthjn?usp=sharing>`_ for an example colab where the cuDSS version of SCS, along with
+required dependencies, is installed and used.
 
-See :ref:`here <python_interface>` for how to enable MKL when solving. MKL is typically
-faster than the built-in linear system solver.
+Testing
+"""""""
 
 To test that SCS installed correctly, and you have pytest installed, run
 
@@ -43,7 +65,7 @@ You can install with OpenMP parallelization support using
 
   python legacy_setup.py install --scs --openmp
 
-You can install the GPU interface using (the GPU solver is no longer recommended)
+You can install the GPU indirect solver using
 
 .. code:: bash
 
diff --git a/docs/src/linear_solver/index.rst b/docs/src/linear_solver/index.rst
@@ -102,24 +102,21 @@ Sparse GPU indirect method
 
 The above linear solvers all run on CPU. We also have support for a GPU version
 of the indirect solver, where the matrix multiplies are all performed on the
-GPU.
+GPU. Usually the direct GPU solver will be faster than this solver.
 
-.. _cudss:
+.. _cudss_solver:
 
 Sparse GPU direct method
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
-This is a linear solver that uses the `cuDSS<https://developer.nvidia.com/cudss>`_
-library to solve the linear system on the GPU. It is similar to the direct
-solver, but uses the calls to cuDSS library to perform the analysis, numerical
-(re-)factorization and subsequent solves. According to its documentation
-
-> reordering (a major part of the analysis phase) is executed on the host,
-> while symbolic factorization (another part of the analysis phase),
-> numerical factorization and solve are executed on the GPU.
-
-As the newest addition to SCS this solver is still under development and not
-as well battle-tested as the other solvers.
+This is a linear solver that uses the `cuDSS
+<https://developer.nvidia.com/cudss>`_ library to solve the linear system on
+the GPU. It is similar to the direct solver, but uses the calls to cuDSS
+library to perform the analysis, numerical (re-)factorization and subsequent
+solves. According to its documentation reordering (a major part of the analysis
+phase) is executed on the host while symbolic factorization (another part of
+the analysis phase), numerical factorization, and solve are executed on the
+GPU.
 
 .. _new_linear_solver: