Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some bug patches + torch no_grad forward pass #157

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

healeyq3
Copy link
Contributor

Continuing the work that @PTNobel and I started in the diffcp repository to address cvxpy/cvxpy#2485.

Specifically, used the new diffcp solve_only_batch call chain in the torch CvxpyLayer when gradients are not desired for reverse autodiff. This functionality can be accessed by

  1. Setting all parameter tensors which will be passed into the CvxpyLayer to not require a gradient.
  2. Making the layer(param1, param2, ...) call inside with torch.no_grad()

For examples, see the last two tests in torch/test_cvxpylayer.py.

Additionally, I patched two errors in the test file. Note that the tests

  1. test_basic_gp
  2. test_lml
  3. test_simple_batch_socp
    failed prior to the CvxpyLayer additions that I made. The gp failure appears to be due to difference in solutions obtained by cvxpylayers and pure cvxpy. The second two failed tests are due to small(ish) Jacobian mismatches.

The next steps (I think) to complete issue cvxpy/cvxpy#2485 are

  1. Release the new version of diffcp with the solve_only functionality (I used my local copy to make these cvxpylayer changes.)
  2. See if this update provides any meaningful computational enhancements.
  3. Implement this no_grad functionality for JAX and TensorFlow layers.

However, please let me know if there are more suggestions and/or edits I need to make to this PR. Thanks!

As, bs, cs, cone_dicts, **solver_args)
)
else:
xs, _, _ = diffcp.cone_program.solve_only_batch(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a PR on diffcp to re-export this function from __init__.py so we don't need to access it from the cone_program namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - will do now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just opened the PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xs, _, _ = diffcp.cone_program.solve_only_batch(
xs, _, _ = diffcp.solve_only_batch(

Let's use the soon-to-be-newly exported name

@PTNobel
Copy link
Member

PTNobel commented Jul 29, 2024

Alright; once I have time to fix the diffcp release pipeline and we pull that off, I'll merge this and do the CVXPYlayers release.

@michaelamir2151
Copy link

Just wanted to say thanks, this PR is very helpful. I am interested in CVXPYLayers for batch computation, but computing the gradients heavily slows things down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the first half of the notebook (Background) + the Huber reformulation appendix is ready for review.

The implementation half is almost ready: I just need to

  1. Update the code to match the learning section explanation.
  2. Update some notation in the function docstrings.
  3. (Most importantly) Fix the differentiation error for the robust smoother. I fixed it in a JAX implementation by putting the problem in epigraph form, but that doesn’t seem to be working here. I also made those changes in this PyTorch implementation a bit late into the day, so I might’ve made a small typo.

With respect to the write-up, I completely understand if we need to cut a good chunk of it. I wrote a lot of it for my own understanding, and I was also enjoying the process.

In the write-up, the one bit I feel less sure about from a mathematical standpoint is the auto-tuning (learning) problem section. Specifically, the bit about extended-value extensions. I definitely will appreciate a sanity check here.

Thank you for looking through this example! Well, the first half that is. I’m planning on returning to the code this weekend. Targeting total example completion by end of the coming week.

Oh, I also included the html file in this commit just to make it easier to review the write-up. Obviously we’ll discard before merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost finished with the code fixes/updates for this learned_state_estimation.ipynb example. Aiming to make another push (ideally) by EOD 08/13. However, it's also likely this push will be EOD 08/14.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! I've started reading this PR...

Copy link
Member

@PTNobel PTNobel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the code changes. Looks good! Thanks for the cleanups of old code too...

Looking forward to reviewing the notebook soon.

As, bs, cs, cone_dicts, **solver_args)
)
else:
xs, _, _ = diffcp.cone_program.solve_only_batch(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
xs, _, _ = diffcp.cone_program.solve_only_batch(
xs, _, _ = diffcp.solve_only_batch(

Let's use the soon-to-be-newly exported name

A_th.t() @ A_th +
torch.eye(n).double())[0]
b): return torch.linalg.solve(
A.t() @ A + torch.eye(n).double(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A.t() @ A + torch.eye(n).double(),
A.t() @ A + torch.eye(n, dtype=torch.float64),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Parth! I'll make these changes locally and push them up with the updated example right now.

@healeyq3
Copy link
Contributor Author

(Regarding most recent commit, minus the requested changes.)
The new learned_state_estimation example is basically complete.

This new push includes a good number of updates to the code, including some basic tuning instantiation at the bottom of the Implementation section. (which has with it some new graphs!)

Current todos just include finishing up the cross-validation function and related data objects as well as the appendix write-up on creating the selector matrix. I’m hoping that using the cross-validation function with the robust smoother will yield more interesting tuning results than what I’m currently seeing.

I’m definitely open to any suggestions on tweaking the learning function to make the robust smoother tuning more effective.

Finally, as a note, the differentiating through the robust problem error I was experiencing was resolved by switching conda environments. I’ll look into this more before we officially merge this example to cvxpylayers/examples.

I have some obligations on tomorrow (Aug 15) which will prevent me from working on this example, but I'll be back to these final todos on Friday. (And non-example related: I'll also work on creating some bigger problems to test the new no_grad functionality on).

Also let me just say - I'm totally open to any and all comments/criticisms! I really want this example to match the style/quality that is expected of cvxpylayer examples.

Copy link
Member

@PTNobel PTNobel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the CVXPYlayers changes look great!

Tests look fine, would be great to test the performance. I'll read the example soon.

@healeyq3
Copy link
Contributor Author

Tests look fine, would be great to test the performance. I'll read the example soon.

Ah, my bad on the wording of my last comment - that was very sloppy. I should've said "I'll create some bigger problems to test the new no_grad performance on." To do this, unless otherwise directed, I'll write a .py script solving batches of large problems from some different classes (LP, QP, SOCP, GP) and then run that script with the released CVXPYlayers and with my local versions of the to-be-released CVXPYlayers and diffcp.

Thanks for checking those CVXPYlayers changes so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants