Skip to content

perf(as_of): fix quadratic reconstruction in forward simulations (×1.4–×5.4)#1367

Merged
benjello merged 3 commits intomasterfrom
perf/as-of-snapshot-fix
Mar 4, 2026
Merged

perf(as_of): fix quadratic reconstruction in forward simulations (×1.4–×5.4)#1367
benjello merged 3 commits intomasterfrom
perf/as-of-snapshot-fix

Conversation

@benjello
Copy link
Member

Summary

  • _set_as_of unconditionally cleared the snapshot cursor after every write, forcing the next get_array(M) to reconstruct the full array from the base through all M patches — O(N + M·k) per step, quadratic total cost over a multi-month simulation.
  • Root cause: _reconstruct_at(instant) advanced the snapshot to instant during the internal diff computation inside _set_as_of, so the guard snapshot[0] >= instant always triggered on equality, even for strictly forward writes.
  • Fix: distinguish forward (patch appended at end) vs retroactive (out-of-order) SETs. Forward SETs now store value.copy() as the new snapshot → next get_array(instant) is an O(1) exact cache hit. Retroactive SETs keep the existing invalidation logic.

Benchmark (N = 1 000 000 persons, forward GET→SET simulation)

Scenario Before After Gain
1 year, 10% change/month 66 ms 46 ms ×1.4
5 years, 10% change/month 772 ms 188 ms ×4.1
5 years, 30% change/month 1 598 ms 294 ms ×5.4

Test plan

  • All 18 existing test_asof_variable.py tests pass unchanged
  • New TestForwardSimulationBench benchmark added to test_bench_asof.py
  • Version bumped 44.4.0 → 44.4.1

🤖 Generated with Claude Code

Copy link
Member

@eraviart eraviart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for performance improvement

@benjello benjello force-pushed the feat/as-of-patches branch from b1374c4 to f5583ab Compare March 4, 2026 10:58
Base automatically changed from feat/as-of-patches to master March 4, 2026 13:38
benjello and others added 3 commits March 4, 2026 14:42
Introduces as_of = True / "start" / "end" on Variable, enabling
values set at a given instant to persist forward in time until
explicitly overridden — the vectorial analogue of OpenFisca parameters.

- Variable: set_as_of() setter normalises True → "start", rejects
  invalid values, and guards against incompatible set_input helpers
- Holder.get_array: falls back to _get_as_of() when no exact match
- Holder._get_as_of: O(log P) lookup via bisect on _sorted_instants,
  with a linear-scan fallback for cloned holders
- Holder._set: reference sharing for unchanged arrays + defensive
  read-only copy to prevent in-place mutation of stored data
- Holder.clone: copies _sorted_instants independently
- 14 unit tests covering persistence, conventions, reference sharing,
  read-only guard, and declaration validation
- pyproject.toml: exclude docs/ from codespell (French design docs)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@benjello benjello force-pushed the perf/as-of-snapshot-fix branch from 99933c1 to 092b322 Compare March 4, 2026 13:44
@benjello benjello enabled auto-merge March 4, 2026 13:45
@benjello benjello merged commit 81cb3ab into master Mar 4, 2026
24 checks passed
@benjello benjello deleted the perf/as-of-snapshot-fix branch March 4, 2026 13:50
@MattiSG
Copy link
Member

MattiSG commented Mar 4, 2026

Hello,
Performance improvement is always welcome!

However, this changeset also adds a dependency that does not seem to be used anywhere. This increases the attack surface, increases the overall package size and makes maintenance harder. Can you please remove that dependency, or explain how it is used? 🙂

I also notice that a benchmark is claimed to be made, but there is no benchmark code nor methodology. It is very easy for LLMs to fake benchmarks, or to create ones that are meaningless in production. Performance improvements claims should systematically be backed by hard data to avoid unnecessary code churn, maintainer burden, and unforeseen side effects.

Reviewers have a shared responsibility with authors to ensure that we don't unnecessarily increase the attack surface and package size. I understand that you want to go fast, but this pattern of one author adding many changes at once that are AI-generated and make review hard, followed by one human reviewer greenlighting them all with not a single critical comment worries me. Please keep in mind the extent of reusers around the world that use OpenFisca in production for other cases than the specific ones you might be testing. Testing compatibility on a small private codebase is not sufficient as a demonstration of innocuity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants