Skip to content

Conversation

@cpz2024
Copy link
Contributor

@cpz2024 cpz2024 commented Dec 27, 2025

Pull Request

What problem does this PR solve?

Issue Number: Fixed #

Possible side effects?

  • Performance:

  • Backward compatibility:

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @cpz2024, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new secure join primitive, "join_uu", designed for two-party computation within the SPU framework. The primary goal is to enable two distinct parties to perform a join operation on their respective private datasets while maintaining data confidentiality. The implementation utilizes a Cuckoo Hash-based approach, incorporating secure pseudo-random function (SoPrf) outputs and cryptographic hashing (Blake3) to generate permutations that facilitate the secure comparison and merging of records. This enhancement significantly expands SPU's capabilities for privacy-preserving data analysis.

Highlights

  • New Secure Join Functionality: Introduced "join_uu", a secure two-party join operation based on the Cuckoo Hash algorithm, allowing private datasets to be joined without revealing individual data.
  • Cuckoo Hash Integration: Implemented the "_cuckoo_hash_to_perm_v" kernel and its underlying logic in "pv2k.cc", leveraging "yacl::CuckooIndex" and "Blake3" hashing for efficient and secure permutation generation.
  • Comprehensive Testing: Added dedicated unit tests ("join_test.cc") for the "join_uu" function, covering various field types, protocol kinds, and both single and multi-key join scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a secure two-party join implementation (join_uu) based on the "Private Join and Compute from PIR with Default" paper. The changes are comprehensive, including the core join logic, a new MPC kernel for Cuckoo Hashing, build file updates, and corresponding tests.

I've identified a critical correctness issue in the join logic where it selects matching rows, which could lead to incorrect results. Additionally, there are several opportunities for improvement in code maintainability, such as removing test dependencies from production code, improving comments, and reducing code duplication. I've also noted some minor issues in the test files that should be addressed.

@deadlywing
Copy link
Collaborator

我优化了一下,现在的耗时看上去科学一点了;需要先根据上面的comment先修改一下;

另外,在我确认代码整体逻辑ok后,还有一些setting,需要结合SPU的实现定量分析一下,并且通过实验来验证一下:

  1. 直观上,hash function应该少一点更好 (起码从轮数上看是的);
  2. 哪个表去做cuckoo hash;

你需要做一些实验,变量有(下面的具体数值是我随便拍的,你可以适当调整):

  1. 两个表的size(10w,50w, 100w)
  2. 两个表的payloads个数(0, 1, 10, 20)
  3. hash 函数个数(2,3; 具体的scale_factor 你可以看着调整一下,我这边在100w下发现2个hash函数的时候,factor=2.1, 3个hash函数的时候,factor=1.2就能跑)

PS:你可以写个脚本跑上面所有的实验,把输出都重定向到文件里,然后再单独分析哈~(方便起见可以直接收集link输出的total通信量和通信轮数以及hal层输出的时间)

  • HAL profiling: total time 32.635025206
  • Join send bytes: 2010400030
  • Join send actions: 82
bazelisk run //libspu/kernel/hal:join_test -- --gtest_filter="*BigDataJoinTest*"
INFO: Analyzed target //libspu/kernel/hal:join_test (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //libspu/kernel/hal:join_test up-to-date:
  bazel-bin/libspu/kernel/hal/join_test
INFO: Elapsed time: 10.847s, Critical Path: 10.68s
INFO: 3 processes: 1 internal, 2 processwrapper-sandbox.
INFO: Build completed successfully, 3 total actions
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh libspu/kernel/hal/join_test '--gtest_filter=*BigDataJoinTest*'
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //libspu/kernel/hal:join_test
-----------------------------------------------------------------------------
Running main() from gmock_main.cc
Note: Google Test filter = *BigDataJoinTest*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BigDataJoinTest
[ RUN      ] BigDataJoinTest.Work
[2026-01-08 02:56:57.596] [info] [thread_pool.cc:30] Create a fixed thread pool with size 63
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] HLO profiling: total time 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] HAL profiling: total time 32.635025206
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - soprf, executed 1 times, duration 16.252644108s, send bytes 1040000000 recv bytes 1040000000, send actions 19, recv actions 19
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_mul, executed 6 times, duration 9.238196789s, send bytes 144000000 recv bytes 144000000, send actions 12, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - reveal_to, executed 2 times, duration 2.967646466s, send bytes 24000000 recv bytes 24000000, send actions 3, recv actions 3
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _and, executed 7 times, duration 2.635390364s, send bytes 496000000 recv bytes 496000000, send actions 19, recv actions 19
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _perm2_sv, executed 9 times, duration 0.887404965s, send bytes 98400030 recv bytes 151200015, send actions 15, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_equal, executed 2 times, duration 0.471000155s, send bytes 208000000 recv bytes 208000000, send actions 14, recv actions 14
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _cuckoo_hash_to_perm_v, executed 1 times, duration 0.165842127s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _not, executed 4 times, duration 0.009093997s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_add, executed 4 times, duration 0.005032646s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _xor, executed 2 times, duration 0.002712303s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - seal, executed 2 times, duration 6.1286e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] MPC profiling: total time 21.028531686999994
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - b2a, executed 8 times, duration 11.729486609s, send bytes 64000000 recv bytes 64000000, send actions 8, recv actions 8
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - a2b, executed 3 times, duration 4.608958895s, send bytes 832000000 recv bytes 832000000, send actions 21, recv actions 21
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - and_bb, executed 17 times, duration 1.557097513s, send bytes 704000000 recv bytes 704000000, send actions 17, recv actions 17
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - xor_bb, executed 75 times, duration 1.221434395s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - perm2_sv, executed 9 times, duration 0.887346036s, send bytes 98400030 recv bytes 151200015, send actions 15, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - equal_ss, executed 2 times, duration 0.470972333s, send bytes 208000000 recv bytes 208000000, send actions 14, recv actions 14
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - mul_aa, executed 6 times, duration 0.469724159s, send bytes 96000000 recv bytes 96000000, send actions 6, recv actions 6
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - xor_bp, executed 15 times, duration 0.022715458s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - concatenate, executed 1 times, duration 0.020547212s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - pad, executed 2 times, duration 0.012355138s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - p2a, executed 3 times, duration 0.00766053s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - a2v, executed 2 times, duration 0.006513844s, send bytes 8000000 recv bytes 8000000, send actions 1, recv actions 1
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - negate_p, executed 4 times, duration 0.003025485s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_pp, executed 2 times, duration 0.003003714s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_aa, executed 2 times, duration 0.00274726s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - and_bp, executed 2 times, duration 0.002624968s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_ap, executed 2 times, duration 0.002256703s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - make_p, executed 5 times, duration 3.0058e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - extract_slice, executed 6 times, duration 2.5677e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - reshape, executed 2 times, duration 5.7e-06s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
Join send bytes: 2010400030
Join recv bytes: 2063200015
Join send actions: 82
Join recv actions: 79
[       OK ] BigDataJoinTest.Work (32798 ms)
[----------] 1 test from BigDataJoinTest (32798 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (32798 ms total)
[  PASSED  ] 1 test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants