Skip to content

Side channels for measureConversion operation #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
martinthomson opened this issue May 1, 2025 · 5 comments · May be fixed by #188
Open

Side channels for measureConversion operation #153

martinthomson opened this issue May 1, 2025 · 5 comments · May be fixed by #188
Assignees
Labels
editor-ready Decisions have been made, can make a proposal

Comments

@martinthomson
Copy link
Member

Discussion topic here is what the nature of the leakage we are willing to tolerate for measureConversion(). Ideally, this is an operation that runs in constant time with respect to information that the site doesn't have. However, the main thing that the site doesn't have is the number of impressions that are searched. That's a fundamental problem here.

The best idea we presently have is to run two parallel processes. The first does the complex attribution logic that searches impressions and the real work of the API. The second runs a timer for a fixed amount of time, plus maybe some randomness. The first thread does not report its results; it only drops the results in a pre-arranged spot. When the timer pops for the second thread, it takes whatever value exists (which might be nothing if the first isn't finished), packages that up, and resolves the promise.

This isn't ideal, because it means that there will always be some amount of delay. Tuning the timer could even be difficult if we need to account for different performance profiles and loading. However, it means that we open up more options for what the attribution logic can do. We don't need to focus on being purely constant time in the sense that we would for cryptographic operations. At least not entirely: there will still be some leakage through cache timing and all of the Spectre and Meltdown style of attacks, but we can require that the processing be done out-of-process to help mitigate that. At a minimum, this would allow us to attain the same sorts of isolation protections that sites get from each other.

Alternative ideas are very much welcome here. The idea above is not ideal, because it will fail in some circumstances in ways that are inscrutable and it leads to delays that will be awkward to handle in some cases. This is not a topic that we have a clear answer on so far and improvements here might help make other parts of the work more robust, flexible, or performant.

@martinthomson martinthomson added the discuss Needs working group discussion label May 1, 2025
@bmayd
Copy link

bmayd commented May 12, 2025

It seems like it would be very difficult to get the duration of the timer on the second process right; maybe it could be set in rough proportion to the number of records and/or capabilities of the host, but seems like likely to serve most users poorly. Did folks consider tactics like adding a random delays to the processing or, if there's concern about compute pressure monitoring, randomly choosing some number of records to reprocess and ignore those results?

@martinthomson
Copy link
Member Author

Discussed briefly on 2025-05-13: No real conclusion from that discussion.

(My own sense is that we'll probably need to see a proposal before we can be confident that what we've got makes sense. It might even need to be implemented.)

@bmayd
Copy link

bmayd commented May 13, 2025

When the timer pops for the second thread, it takes whatever value exists (which might be nothing if the first isn't finished), packages that up, and resolves the promise.

I read this to mean that when the timer runs out prior to the first thread finishing its work, the measureConversion() operation essentially fails and a null report is generated. If that's correct, it means browsers with the fullest impression stores and highest likelihood of recording a conversion are also going to be most likely to time out and report nothing. Clearly that would discourage adoption, so if we push forward with this approach we'll want to set the timer value as high as we can stand to make it.

@csharrison
Copy link
Collaborator

When the timer pops for the second thread, it takes whatever value exists (which might be nothing if the first isn't finished), packages that up, and resolves the promise.

I read this to mean that when the timer runs out prior to the first thread finishing its work, the measureConversion() operation essentially fails and a null report is generated. If that's correct, it means browsers with the fullest impression stores and highest likelihood of recording a conversion are also going to be most likely to time out and report nothing. Clearly that would discourage adoption, so if we push forward with this approach we'll want to set the timer value as high as we can stand to make it.

I think we're all aligned that we want to minimize loss with respect to side channel mitigations. It should be straightforward to track this in aggregate and ensure that the timer is set to have low loss.

It seems like it would be very difficult to get the duration of the timer on the second process right; maybe it could be set in rough proportion to the number of records and/or capabilities of the host, but seems like likely to serve most users poorly. Did folks consider tactics like adding a random delays to the processing or, if there's concern about compute pressure monitoring, randomly choosing some number of records to reprocess and ignore those results?

  1. If it is set in proportion to the # of records, the elapsed time will leak the # of records, so this will introduce a new side channel. Similarly if it is based on the capabilities of the host, although we could argue e.g. that doing it based on existing "public" information available to the web platform does not regress things here.

  2. Regarding random delays, this unfortunately does not guarantee the differential privacy we'd want in the system unless we're very careful about it. This is because we cannot add negative delay. If a measureConversion call takes X ms, we'd know for sure that the non-noisy processing time was at least X ms with 100% certainty.

@martinthomson
Copy link
Member Author

Action on @apasel422: check on ARA latency figures and propose a cutoff for the timer.

@martinthomson martinthomson added editor-ready Decisions have been made, can make a proposal and removed discuss Needs working group discussion labels May 15, 2025
@apasel422 apasel422 self-assigned this May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editor-ready Decisions have been made, can make a proposal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants