Side channels for measureConversion operation #153

martinthomson · 2025-05-01T22:02:12Z

Discussion topic here is what the nature of the leakage we are willing to tolerate for measureConversion(). Ideally, this is an operation that runs in constant time with respect to information that the site doesn't have. However, the main thing that the site doesn't have is the number of impressions that are searched. That's a fundamental problem here.

The best idea we presently have is to run two parallel processes. The first does the complex attribution logic that searches impressions and the real work of the API. The second runs a timer for a fixed amount of time, plus maybe some randomness. The first thread does not report its results; it only drops the results in a pre-arranged spot. When the timer pops for the second thread, it takes whatever value exists (which might be nothing if the first isn't finished), packages that up, and resolves the promise.

This isn't ideal, because it means that there will always be some amount of delay. Tuning the timer could even be difficult if we need to account for different performance profiles and loading. However, it means that we open up more options for what the attribution logic can do. We don't need to focus on being purely constant time in the sense that we would for cryptographic operations. At least not entirely: there will still be some leakage through cache timing and all of the Spectre and Meltdown style of attacks, but we can require that the processing be done out-of-process to help mitigate that. At a minimum, this would allow us to attain the same sorts of isolation protections that sites get from each other.

Alternative ideas are very much welcome here. The idea above is not ideal, because it will fail in some circumstances in ways that are inscrutable and it leads to delays that will be awkward to handle in some cases. This is not a topic that we have a clear answer on so far and improvements here might help make other parts of the work more robust, flexible, or performant.

The text was updated successfully, but these errors were encountered:

bmayd · 2025-05-12T16:22:00Z

It seems like it would be very difficult to get the duration of the timer on the second process right; maybe it could be set in rough proportion to the number of records and/or capabilities of the host, but seems like likely to serve most users poorly. Did folks consider tactics like adding a random delays to the processing or, if there's concern about compute pressure monitoring, randomly choosing some number of records to reprocess and ignore those results?

martinthomson · 2025-05-13T07:20:54Z

Discussed briefly on 2025-05-13: No real conclusion from that discussion.

(My own sense is that we'll probably need to see a proposal before we can be confident that what we've got makes sense. It might even need to be implemented.)

bmayd · 2025-05-13T13:55:31Z

When the timer pops for the second thread, it takes whatever value exists (which might be nothing if the first isn't finished), packages that up, and resolves the promise.

I read this to mean that when the timer runs out prior to the first thread finishing its work, the measureConversion() operation essentially fails and a null report is generated. If that's correct, it means browsers with the fullest impression stores and highest likelihood of recording a conversion are also going to be most likely to time out and report nothing. Clearly that would discourage adoption, so if we push forward with this approach we'll want to set the timer value as high as we can stand to make it.

csharrison · 2025-05-13T14:53:21Z

When the timer pops for the second thread, it takes whatever value exists (which might be nothing if the first isn't finished), packages that up, and resolves the promise.

I read this to mean that when the timer runs out prior to the first thread finishing its work, the measureConversion() operation essentially fails and a null report is generated. If that's correct, it means browsers with the fullest impression stores and highest likelihood of recording a conversion are also going to be most likely to time out and report nothing. Clearly that would discourage adoption, so if we push forward with this approach we'll want to set the timer value as high as we can stand to make it.

I think we're all aligned that we want to minimize loss with respect to side channel mitigations. It should be straightforward to track this in aggregate and ensure that the timer is set to have low loss.

It seems like it would be very difficult to get the duration of the timer on the second process right; maybe it could be set in rough proportion to the number of records and/or capabilities of the host, but seems like likely to serve most users poorly. Did folks consider tactics like adding a random delays to the processing or, if there's concern about compute pressure monitoring, randomly choosing some number of records to reprocess and ignore those results?

If it is set in proportion to the # of records, the elapsed time will leak the # of records, so this will introduce a new side channel. Similarly if it is based on the capabilities of the host, although we could argue e.g. that doing it based on existing "public" information available to the web platform does not regress things here.
Regarding random delays, this unfortunately does not guarantee the differential privacy we'd want in the system unless we're very careful about it. This is because we cannot add negative delay. If a measureConversion call takes X ms, we'd know for sure that the non-noisy processing time was at least X ms with 100% certainty.

martinthomson · 2025-05-15T21:14:27Z

Action on @apasel422: check on ARA latency figures and propose a cutoff for the timer.

martinthomson added the discuss Needs working group discussion label May 1, 2025

bmayd mentioned this issue May 13, 2025

Report aggregate API error and operational counts. #157

Open

martinthomson added editor-ready Decisions have been made, can make a proposal and removed discuss Needs working group discussion labels May 15, 2025

apasel422 linked a pull request May 29, 2025 that will close this issue

Protect measureConversion algorithm from timing-based side channels #188

Open

apasel422 self-assigned this May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Side channels for measureConversion operation #153

Side channels for measureConversion operation #153

martinthomson commented May 1, 2025

bmayd commented May 12, 2025

Uh oh!

martinthomson commented May 13, 2025

Uh oh!

bmayd commented May 13, 2025

Uh oh!

csharrison commented May 13, 2025

Uh oh!

martinthomson commented May 15, 2025

Uh oh!

Side channels for measureConversion operation #153

Side channels for measureConversion operation #153

Comments

martinthomson commented May 1, 2025

bmayd commented May 12, 2025

Uh oh!

martinthomson commented May 13, 2025

Uh oh!

bmayd commented May 13, 2025

Uh oh!

csharrison commented May 13, 2025

Uh oh!

martinthomson commented May 15, 2025

Uh oh!