Skip to content

Conversation

@jan-schuchardt
Copy link

@jan-schuchardt jan-schuchardt commented Nov 13, 2024

Hi,

this is a draft pull request related to issue #274.

I have implemented instances of AdditiveNoisePrivacyLoss whose privacy loss distribution is dominated by a pair of two mixture distributions. The class hierarchy looks as follows:

DoubleMixturePrivacyLoss
├── DoubleMixtureLaplacePrivacyLoss
├── DoubleMixtureGaussianPrivacyLoss
│   ├── MixtureGaussianPrivacyLoss

I have kept the existing MixtureGaussianPrivacyLoss as a special case of DoubleMixtureGaussianPrivacyLoss for backwards compatibility and because it has certain optimizations that are not implemented in its super classes.

Specifically, the DoubleMixtureXYZ classes are less optimized in the following sense:

  • No caching of constant terms in the privacy loss
  • No specialized heuristic for finding binary search bounds

These optimization cannot be trivially generalized to the super classes, because they rely on the (inverse) likelihood ratio decomposing into a sum of single-distribution ratios. This is not the case when we have two mixtures.

Before I create a final pull request, it would be great if we could discuss the following questions:

  • Can we live with the above optimization-related issues? (The implementation was still sufficiently fast for use with DP-SGD in my last project).
  • The DoubleMixtureXYZ classes are currently only being tested via tests for MixtureGaussianPrivacyLoss. Do we need separate tests for each class?

@arung54
Copy link
Contributor

arung54 commented Nov 20, 2024

Hi Jan, sorry for the delayed response. Wanted to follow up on this:

Can we live with the above optimization-related issues? (The implementation was still sufficiently fast for use with DP-SGD in my last project).

I think it is okay for now, and if someone requests we can work on optimizing later (either internally, or with your help).

The DoubleMixtureXYZ classes are currently only being tested via tests for MixtureGaussianPrivacyLoss. Do we need separate tests for each class?

It would be good to make sure any behavior specific to these classes are being tested. If you feel this is a lot of added work, we can potentially have you submit the current version and I can make a follow-up change that adds more robust tests.

In addition, there are some upcoming changes to the structure of privacy_loss_mechanism that might affect the PR (the amount of work needed to correct the PR for these changes should be pretty small). So it may be best to wait until those changes are out to make a final pull request :)

@jan-schuchardt
Copy link
Author

jan-schuchardt commented Nov 22, 2024

So it may be best to wait until those changes are out to make a final pull request :)

All right, let's do that.

The existing tests do not look to complicated, so I can also take care of those.

@pritkamath
Copy link
Contributor

pritkamath commented Nov 29, 2024

Hi Jan.

The changes alluded to by Arun have been pushed now (see changes to privacy_loss_mechanism.py in this commit).

Basically, your PR made us realize that it would be more natural to introduce a common class of MonotonePrivacyLoss as a common abstraction of AdditiveNoisePrivacyLoss and MixtureGaussianPrivacyLoss.
Please sync your PR with those changes. Thank you and sorry about the extra trouble.

@jan-schuchardt
Copy link
Author

Ok, great. I'm on a vacation / conference travel for the next couple of weeks, but I'll look into doing the rebase after that.

@jan-schuchardt jan-schuchardt marked this pull request as ready for review February 20, 2025 13:21
@jan-schuchardt
Copy link
Author

jan-schuchardt commented Feb 20, 2025

Hi!

I've been somewhat busy, but I've now gotten around to integrating the proposed changes into the new class hierarchy:

MonotonePrivacyLoss
├─ DoubleMixturePrivacyLoss
│  ├─ DoubleMixtureGaussianPrivacyLoss
│  │  ├─ MixtureGaussianPrivacyLoss
│  ├─ DoubleMixtureLaplacePrivacyLoss

I've also added unit tests to ensure that

  • We recover the behavior of MixtureGaussianPrivacyLoss or LaplacePrivacyLoss when one of the two mixtures has a single component with mean 0
  • We get the correct privacy_loss and delta_for_epsilon when both mixtures have multiple components

The reference values for delta_for_epsilon are computed via scipy.quad.

It would be great if you could have another look at my PR.

@arung54
Copy link
Contributor

arung54 commented Feb 21, 2025

Thanks Jan for making the changes! I am on vacation right now but will review the PR once I'm back.

Copy link
Contributor

@arung54 arung54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jan, overall the changes look great! Most of these comments are catching typos or suggestions on style.

@jan-schuchardt
Copy link
Author

jan-schuchardt commented Mar 11, 2025

Thank you! I've set all comments with trivial fixes (typos etc.) to resolved.

There are five open comments. It would be nice if you could have another look at those and let me know if everything looks ok to you, or if we should make any additional changes.

@arung54
Copy link
Contributor

arung54 commented Mar 17, 2025

Hi Jan,

Thanks for your responses! At this point everything LGTM. I'll try to take a second pass to double check, and I need to do a bit of work on my end (some internal permissions, nothing you need to worry about) before I can merge the PR, but at this point it should be ready to submit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants