perf(as_of): fix quadratic reconstruction in forward simulations (×1.4–×5.4)#1367
perf(as_of): fix quadratic reconstruction in forward simulations (×1.4–×5.4)#1367
Conversation
eraviart
left a comment
There was a problem hiding this comment.
OK for performance improvement
b1374c4 to
f5583ab
Compare
Introduces as_of = True / "start" / "end" on Variable, enabling values set at a given instant to persist forward in time until explicitly overridden — the vectorial analogue of OpenFisca parameters. - Variable: set_as_of() setter normalises True → "start", rejects invalid values, and guards against incompatible set_input helpers - Holder.get_array: falls back to _get_as_of() when no exact match - Holder._get_as_of: O(log P) lookup via bisect on _sorted_instants, with a linear-scan fallback for cloned holders - Holder._set: reference sharing for unchanged arrays + defensive read-only copy to prevent in-place mutation of stored data - Holder.clone: copies _sorted_instants independently - 14 unit tests covering persistence, conventions, reference sharing, read-only guard, and declaration validation - pyproject.toml: exclude docs/ from codespell (French design docs) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Made-with: Cursor
99933c1 to
092b322
Compare
|
Hello, However, this changeset also adds a dependency that does not seem to be used anywhere. This increases the attack surface, increases the overall package size and makes maintenance harder. Can you please remove that dependency, or explain how it is used? 🙂 I also notice that a benchmark is claimed to be made, but there is no benchmark code nor methodology. It is very easy for LLMs to fake benchmarks, or to create ones that are meaningless in production. Performance improvements claims should systematically be backed by hard data to avoid unnecessary code churn, maintainer burden, and unforeseen side effects. Reviewers have a shared responsibility with authors to ensure that we don't unnecessarily increase the attack surface and package size. I understand that you want to go fast, but this pattern of one author adding many changes at once that are AI-generated and make review hard, followed by one human reviewer greenlighting them all with not a single critical comment worries me. Please keep in mind the extent of reusers around the world that use OpenFisca in production for other cases than the specific ones you might be testing. Testing compatibility on a small private codebase is not sufficient as a demonstration of innocuity. |
Summary
_set_as_ofunconditionally cleared the snapshot cursor after every write, forcing the nextget_array(M)to reconstruct the full array from the base through all M patches — O(N + M·k) per step, quadratic total cost over a multi-month simulation._reconstruct_at(instant)advanced the snapshot toinstantduring the internal diff computation inside_set_as_of, so the guardsnapshot[0] >= instantalways triggered on equality, even for strictly forward writes.value.copy()as the new snapshot → nextget_array(instant)is an O(1) exact cache hit. Retroactive SETs keep the existing invalidation logic.Benchmark (N = 1 000 000 persons, forward GET→SET simulation)
Test plan
test_asof_variable.pytests pass unchangedTestForwardSimulationBenchbenchmark added totest_bench_asof.py🤖 Generated with Claude Code