Implement HyKKT Ruiz Scaling #317

adhamsi · 2025-06-23T21:34:58Z

Description

A module ruiz is added implementing the RuizScaler class to be used for HyKKT.

Closes #330.

Proposed changes

Implements RuizScaler and RuizScalerKernelImpl in CPU, HIP, and CUDA. Recreates the original test. RuizScaler methods take in matrix::Csr and vector::Vector object types to set the matrix and vector data and implements the scale method to perform the scaling in-place.

Checklist

All tests pass. Code tested on
- CPU backend
- CUDA backend
- HIP backend
Code compiles cleanly with flags -Wall -Wpedantic -Wconversion -Wextra.
The new code follows Re::Solve style guidelines.
There are unit tests for the new code.
The new code is documented.
The feature branch is rebased with respect to the target branch.

Further comments

shakedregev

This is getting a segfault for CUDA. Also, fix merge conflicts.

shakedregev · 2025-06-27T14:42:44Z

Ruiz tests are working, but let's fix the perm tests that made it in here.

pelesh

It would be good to design more intuitive and easy to verify unit tests.
More extensive documentation is needed for the class methods.
Matrix/vector objects probably do not need to be unpacked except for passing input to GPU kernels.

pelesh · 2025-06-27T21:26:11Z

resolve/hykkt/ruiz/RuizScalingKernelsCUDA.cu

+    void RuizScalingKernelsCUDA::adaptDiagScale(index_type n_hes, index_type n_total, index_type* hes_i, index_type* hes_j, real_type* hes_v, index_type* jac_i, index_type* jac_j, real_type* jac_v, index_type* jac_tr_i, index_type* jac_tr_j, real_type* jac_tr_v, real_type* rhs1, real_type* rhs2, real_type* aggregate_scaling_vector, real_type* scaling_vector)
+    {


Arguments of this function should be Re::Solve vectors and matrices. I believe you need to unpack them into raw data arrays only before sending them to GPU kernels.

This would mean at the beginning of the wrappers there would be something like

const index_type* hes_i = hes->getRowData(); const index_type* hes_j = hes->getColData(); const real_type* hes_v = hes->getValues(); const index_type* jac_i = jac->getRowData(); const index_type* jac_j = jac->getColData(); const real_type* jac_v = jac->getValues(); const index_type* jac_tr_i = jac_tr->getRowData(); const index_type* jac_tr_j = jac_tr->getColData(); const real_type* jac_tr_v = jac_tr->getValues();

so it would be a bit verbose to put this in every kernel implementation. Currently, the unpacking occurs as soon as the data is passed in to the RuizScaler top-level object so it is done once. Implementing it this way would require the unpacking occur repeatedly for every iteration of scaling.

Yes, the raw pointers should be accessed only in the kernel call. So calling the kernel should look something like this:

void RuizScalingKernelsHIP::adaptDiagScale(matrix::Sparse* hes, // more arguments ... ) { int block_size = 256; int num_blocks = (n_total + block_size - 1) / block_size; kernels::adaptDiagScale<<<num_blocks, block_size>>>(hes->getNumRows(), hes->getRowData(), hes->getColData(), // more arguments ... ); }

Ideally, matrix/vector objects would be passed into the kernel itself, but CUDA/HIP kernels support only fundamental types and pointers to them.

resolve/hykkt/ruiz/RuizScalingKernelsHIP.hip

resolve/hykkt/ruiz/RuizScalingKernelImpl.hpp

resolve/hykkt/ruiz/RuizScalingKernelsCUDA.cu

pelesh · 2025-06-27T21:38:42Z

tests/unit/hykkt/HykktRuizScalingTests.hpp

+        if (fabs(H->getValues(memory::HOST)[n / 2 - 1] - 0.062378167641326) > tol)
+        {
+          test_passed = false;
+          std::cout << "Test failed: H[n/2-1][n/2-1] = " << H->getValues(memory::HOST)[n / 2 - 1]
+                    << ", expected " << 0.062378167641326 << "\n";
+        }


Tests like this a typically fragile. Consider designing tests that are more intuitive and easier to verify.

This test essentially checks if the behavior of Ruiz scaling has changed. It says little about Ruiz scaling correctness.

pelesh · 2025-06-27T21:40:14Z

tests/unit/hykkt/HykktRuizScalingTests.hpp

+
+      TestOutcome ruizTest()
+      {


Having some documentation here as to what is being tested here and how would be helpful.

resolve/hykkt/ruiz/RuizScalingKernelsCUDA.cu

resolve/hykkt/ruiz/RuizScalingKernelsCPU.cpp

resolve/hykkt/ruiz/RuizScaler.cpp

shakedregev · 2025-07-02T20:26:13Z

Still need to fix the merge conflicts and compile with -D RESOLVE_USE_ASAN=ON to catch memory leaks. Then run the tests normally. If all of them pass, there's no leaks.

resolve/hykkt/ruiz/RuizScaler.cpp

pelesh

Adding couple of comments related to my earlier review.

resolve/hykkt/Permutation.hpp

resolve/hykkt/ruiz/RuizScaler.hpp

adhamsi · 2025-07-03T15:58:01Z

Still need to fix the merge conflicts and compile with -D RESOLVE_USE_ASAN=ON to catch memory leaks. Then run the tests normally. If all of them pass, there's no leaks.

Leaks have now been fixed.

adhamsi and others added 13 commits June 20, 2025 15:19

add ruiz scaler and handler

38bac88

refactor the interfaces and use existing MemoryHandler for reset

1e44d03

update interface

316415f

allocate/deallocate scaling vectors

fa567fa

remove handler as middleman

cc2daad

add empty cpu implementation and cmakelists

7dbac4c

fix cmakelists

b4e8e08

cpu implementation

2c64a69

hip implementation

b2064fa

Apply pre-commmit fixes

1f30af0

use resolve matrix and vector types

acb844c

cuda implementation

02ce6e7

ruiz scaling test

a1c95b7

adhamsi marked this pull request as ready for review June 26, 2025 15:50

adhamsi force-pushed the adham/hykkt-ruiz branch from e98f39a to f8257d8 Compare June 26, 2025 20:08

shakedregev requested changes Jun 26, 2025

View reviewed changes

pelesh requested changes Jun 27, 2025

View reviewed changes

adhamsi added 4 commits June 30, 2025 09:08

fix bugs in kernel implementations

56807d1

comments in implementation files

8e32c56

fix imports and leak

e7df58a

fix memory error in ruiz test

a8e4039

adhamsi force-pushed the adham/hykkt-ruiz branch from fde1d6f to a8e4039 Compare June 30, 2025 14:05

update comments

afe8c73

pelesh reviewed Jul 1, 2025

View reviewed changes

resolve/hykkt/ruiz/RuizScalingKernelsCUDA.cu Outdated Show resolved Hide resolved

pelesh reviewed Jul 1, 2025

View reviewed changes

resolve/hykkt/ruiz/RuizScalingKernelsCPU.cpp Outdated Show resolved Hide resolved

pelesh reviewed Jul 1, 2025

View reviewed changes

resolve/hykkt/ruiz/RuizScaler.cpp Outdated Show resolved Hide resolved

shakedregev marked this pull request as draft July 2, 2025 20:22

shakedregev force-pushed the hykkt-dev branch from 82c0843 to 2d49ed5 Compare July 2, 2025 20:44

pelesh reviewed Jul 2, 2025

View reviewed changes

resolve/hykkt/ruiz/RuizScaler.cpp Outdated Show resolved Hide resolved

pelesh requested changes Jul 2, 2025

View reviewed changes

resolve/hykkt/Permutation.hpp Outdated Show resolved Hide resolved

resolve/hykkt/ruiz/RuizScaler.hpp Outdated Show resolved Hide resolved

resolve/hykkt/ruiz/RuizScaler.hpp Outdated Show resolved Hide resolved

adhamsi added 3 commits July 3, 2025 09:29

Doxygen reference and rename class

bb6b7a9

use object pointers and update all usage

c0cbaf6

fix memory leaks

7716c33

adhamsi force-pushed the hykkt-dev branch from 2d49ed5 to 744822f Compare July 3, 2025 19:09

		void RuizScalingKernelsCUDA::adaptDiagScale(index_type n_hes, index_type n_total, index_type* hes_i, index_type* hes_j, real_type* hes_v, index_type* jac_i, index_type* jac_j, real_type* jac_v, index_type* jac_tr_i, index_type* jac_tr_j, real_type* jac_tr_v, real_type* rhs1, real_type* rhs2, real_type* aggregate_scaling_vector, real_type* scaling_vector)
		{

Implement HyKKT Ruiz Scaling #317

Are you sure you want to change the base?

Implement HyKKT Ruiz Scaling #317

Uh oh!

Conversation

adhamsi commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Proposed changes

Checklist

Further comments

Uh oh!

shakedregev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shakedregev commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pelesh left a comment

Choose a reason for hiding this comment

Uh oh!

pelesh Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

adhamsi Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

pelesh Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pelesh Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

pelesh Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shakedregev commented Jul 2, 2025

Uh oh!

Uh oh!

pelesh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adhamsi commented Jul 3, 2025

Uh oh!

Uh oh!

adhamsi commented Jun 23, 2025 •

edited

Loading

shakedregev left a comment •

edited

Loading

shakedregev commented Jun 27, 2025 •

edited

Loading