-
Notifications
You must be signed in to change notification settings - Fork 141
Zjj/wk25 merge join——join_uu #1368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: zjj/wk25_merge_join
Are you sure you want to change the base?
Zjj/wk25 merge join——join_uu #1368
Conversation
Summary of ChangesHello @cpz2024, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new secure join primitive, "join_uu", designed for two-party computation within the SPU framework. The primary goal is to enable two distinct parties to perform a join operation on their respective private datasets while maintaining data confidentiality. The implementation utilizes a Cuckoo Hash-based approach, incorporating secure pseudo-random function (SoPrf) outputs and cryptographic hashing (Blake3) to generate permutations that facilitate the secure comparison and merging of records. This enhancement significantly expands SPU's capabilities for privacy-preserving data analysis. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a secure two-party join implementation (join_uu) based on the "Private Join and Compute from PIR with Default" paper. The changes are comprehensive, including the core join logic, a new MPC kernel for Cuckoo Hashing, build file updates, and corresponding tests.
I've identified a critical correctness issue in the join logic where it selects matching rows, which could lead to incorrect results. Additionally, there are several opportunities for improvement in code maintainability, such as removing test dependencies from production code, improving comments, and reducing code duplication. I've also noted some minor issues in the test files that should be addressed.
|
我优化了一下,现在的耗时看上去科学一点了;需要先根据上面的comment先修改一下; 另外,在我确认代码整体逻辑ok后,还有一些setting,需要结合SPU的实现定量分析一下,并且通过实验来验证一下:
你需要做一些实验,变量有(下面的具体数值是我随便拍的,你可以适当调整):
PS:你可以写个脚本跑上面所有的实验,把输出都重定向到文件里,然后再单独分析哈~(方便起见可以直接收集link输出的total通信量和通信轮数以及hal层输出的时间)
bazelisk run //libspu/kernel/hal:join_test -- --gtest_filter="*BigDataJoinTest*"
INFO: Analyzed target //libspu/kernel/hal:join_test (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //libspu/kernel/hal:join_test up-to-date:
bazel-bin/libspu/kernel/hal/join_test
INFO: Elapsed time: 10.847s, Critical Path: 10.68s
INFO: 3 processes: 1 internal, 2 processwrapper-sandbox.
INFO: Build completed successfully, 3 total actions
INFO: Running command line: external/bazel_tools/tools/test/test-setup.sh libspu/kernel/hal/join_test '--gtest_filter=*BigDataJoinTest*'
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //libspu/kernel/hal:join_test
-----------------------------------------------------------------------------
Running main() from gmock_main.cc
Note: Google Test filter = *BigDataJoinTest*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BigDataJoinTest
[ RUN ] BigDataJoinTest.Work
[2026-01-08 02:56:57.596] [info] [thread_pool.cc:30] Create a fixed thread pool with size 63
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] HLO profiling: total time 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] HAL profiling: total time 32.635025206
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - soprf, executed 1 times, duration 16.252644108s, send bytes 1040000000 recv bytes 1040000000, send actions 19, recv actions 19
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_mul, executed 6 times, duration 9.238196789s, send bytes 144000000 recv bytes 144000000, send actions 12, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - reveal_to, executed 2 times, duration 2.967646466s, send bytes 24000000 recv bytes 24000000, send actions 3, recv actions 3
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _and, executed 7 times, duration 2.635390364s, send bytes 496000000 recv bytes 496000000, send actions 19, recv actions 19
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _perm2_sv, executed 9 times, duration 0.887404965s, send bytes 98400030 recv bytes 151200015, send actions 15, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_equal, executed 2 times, duration 0.471000155s, send bytes 208000000 recv bytes 208000000, send actions 14, recv actions 14
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _cuckoo_hash_to_perm_v, executed 1 times, duration 0.165842127s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _not, executed 4 times, duration 0.009093997s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - i_add, executed 4 times, duration 0.005032646s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - _xor, executed 2 times, duration 0.002712303s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - seal, executed 2 times, duration 6.1286e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:130] MPC profiling: total time 21.028531686999994
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - b2a, executed 8 times, duration 11.729486609s, send bytes 64000000 recv bytes 64000000, send actions 8, recv actions 8
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - a2b, executed 3 times, duration 4.608958895s, send bytes 832000000 recv bytes 832000000, send actions 21, recv actions 21
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - and_bb, executed 17 times, duration 1.557097513s, send bytes 704000000 recv bytes 704000000, send actions 17, recv actions 17
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - xor_bb, executed 75 times, duration 1.221434395s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - perm2_sv, executed 9 times, duration 0.887346036s, send bytes 98400030 recv bytes 151200015, send actions 15, recv actions 12
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - equal_ss, executed 2 times, duration 0.470972333s, send bytes 208000000 recv bytes 208000000, send actions 14, recv actions 14
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - mul_aa, executed 6 times, duration 0.469724159s, send bytes 96000000 recv bytes 96000000, send actions 6, recv actions 6
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - xor_bp, executed 15 times, duration 0.022715458s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - concatenate, executed 1 times, duration 0.020547212s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - pad, executed 2 times, duration 0.012355138s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - p2a, executed 3 times, duration 0.00766053s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - a2v, executed 2 times, duration 0.006513844s, send bytes 8000000 recv bytes 8000000, send actions 1, recv actions 1
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - negate_p, executed 4 times, duration 0.003025485s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_pp, executed 2 times, duration 0.003003714s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_aa, executed 2 times, duration 0.00274726s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - and_bp, executed 2 times, duration 0.002624968s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - add_ap, executed 2 times, duration 0.002256703s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - make_p, executed 5 times, duration 3.0058e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - extract_slice, executed 6 times, duration 2.5677e-05s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
[2026-01-08 02:57:30.326] [info] [test_util.cc:133] - reshape, executed 2 times, duration 5.7e-06s, send bytes 0 recv bytes 0, send actions 0, recv actions 0
Join send bytes: 2010400030
Join recv bytes: 2063200015
Join send actions: 82
Join recv actions: 79
[ OK ] BigDataJoinTest.Work (32798 ms)
[----------] 1 test from BigDataJoinTest (32798 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (32798 ms total)
[ PASSED ] 1 test.
|
Pull Request
What problem does this PR solve?
Issue Number: Fixed #
Possible side effects?
Performance:
Backward compatibility: