GEP 7 and updates to GEPs 1-5 necessitated by GEP 6 #855

hmgaudecker · 2025-04-07T09:57:28Z

What problem do you want to solve?

Add a GEP for the revamped interface
Update earlier GEPs to reflect the changes that have become necessary after GEP 6 (since our documentation is small, it does not make sense to keep outdated things around).
Add the finalised schema from Validate params files #880 as an appendix to GEP 3

codecov · 2025-04-07T10:39:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.57%. Comparing base (8267dbd) to head (3525917).

Additional details and impacted files

@@                        Coverage Diff                        @@
##           collect-components-of-namespaces     #855   +/-   ##
=================================================================
  Coverage                             77.57%   77.57%           
=================================================================
  Files                                   175      175           
  Lines                                  7563     7563           
=================================================================
  Hits                                   5867     5867           
  Misses                                 1696     1696

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ithub.com/iza-institute-of-labor-economics/gettsim into rename-gettsim-params-fix-yaml-validation

Title says it all. Better be explicit in the structure and allow for nulls than leaving things out accidentally. --------- Co-authored-by: Marvin Immesberger <[email protected]>

…m:iza-institute-of-labor-economics/gettsim into rename-gettsim-params-fix-yaml-validation

…stead.

Next set of to-dos from #897. - Rename the parameters in GETTSIM's yaml files - Restructure where useful (often, moving from scalars to dicts does wonders for readability) - Add now-required unit, reference_period, type keywords --------- Co-authored-by: Marvin Immesberger <[email protected]>

…ms-files

…s now.

ChristianZimpelmann

Overall, it looks cool, but I feel like the entry barrier is still quite high for beginners. I made some comments (especially on aspects that I beginners might find complicated)

docs/geps/gep-07.md

ChristianZimpelmann · 2025-07-09T08:47:39Z

docs/geps/gep-07.md

+
+   ```{raw} html
+   ---
+   file: ./interface_dag.html


The dag does not fit on the page and it is unclear how to scroll to the right.

I know I know... You can scroll by clicking at the bottom of the graph and dragging the pointer.

docs/geps/gep-07.md

interface-playground.ipynb

hmgaudecker

Overall, it looks cool, but I feel like the entry barrier is still quite high for beginners. I made some comments (especially on aspects that I beginners might find complicated)

Thanks! Implemented all of those except for one (no default for main_target(s), it is too important to understand conceptually that you have to request a particular target). Any further concrete suggestions of lowering entry barriers are very welcome!

docs/geps/gep-07.md

interface-playground.ipynb

ChristianZimpelmann · 2025-07-09T13:23:21Z

Some other observations from playing around in the notebook:

result = main(
    date_str="2025-01-01",
    main_target=MainTarget.results.df_with_mapper,
)

Leads to

ValueError: The following arguments to `main` are missing for computing the desired output:

[
    "('input_data', 'flat')",
]

"flat" seems wrong here.

result = main(
    date_str="2025-01-01",
    main_target=MainTarget.specialized_environment.tax_transfer_dag,
)

Leads to

ValueError: The following data columns are missing.

[ ....

Probably better to respond that the argument input_data is missing completely

Why does the original_policy_environment obtained from MainTarget.policy_environment contain keys like anzahl_erwachsene_hhorp_id`? It is not clear to me why these are part of the policy environment?

hmgaudecker · 2025-07-09T13:47:41Z

Some other observations from playing around in the notebook:

Thanks!!!

result = main(
    date_str="2025-01-01",
    main_target=MainTarget.results.df_with_mapper,
)

Leads to

ValueError: The following arguments to `main` are missing for computing the desired output:

[
    "('input_data', 'flat')",
]

"flat" seems wrong here.

It is not wrong, but the message should be improved -- see #1005.

result = main(
    date_str="2025-01-01",
    main_target=MainTarget.specialized_environment.tax_transfer_dag,
)

Leads to

ValueError: The following data columns are missing.

[ ....

Probably better to respond that the argument input_data is missing completely

Agreed, see #1006

Why does the original_policy_environment obtained from MainTarget.policy_environment contain keys like anzahl_erwachsene_hh orp_id? It is not clear to me why these are part of the policy environment?

As has always been the case, it includes all functions operating on data that are around, like anzahl_erwachsene_hh. In addition, we now have possible input columns, too (essentially a dynamic version of TYPES_INPUT_VARIABLES). Ofc, "policy environment" is too narrow a term for some of these elements, but that has always been the case and I don't have a good term in store to improve upon it. Suggestions welcome, ofc!

### What problem do you want to solve? `processed_data` uses an $O(n^2)$ approach to link original and internal IDs. This PR implements an $O(n\cdot \log(n))$ approach. ## Benchmarks ### On `gep-07` (3525917): ```cmd ==================================================================== SUMMARY TABLE ==================================================================== Dataset numpy_time numpy_hash jax_time jax_hash -------------------------------------------------------------------- df_5000.parquet 1.2681 13106402 15.5897 bf85cb3d df_10000.parquet 4.6791 308ca129 30.7932 57ba7579 df_20000.parquet 15.7451 51e8d0b4 62.4070 21636ea4 df_40000.parquet 54.0340 6ae704d8 137.1975 30bbf3ea ``` ### This PR: **[EDIT: updated results after cf37b75]** ```cmd ==================================================================== SUMMARY TABLE ==================================================================== Dataset numpy_time numpy_hash jax_time jax_hash -------------------------------------------------------------------- df_5000.parquet 0.0378 13106402 0.8950 bf85cb3d df_10000.parquet 0.0402 308ca129 0.8108 57ba7579 df_20000.parquet 0.1107 51e8d0b4 1.1354 21636ea4 df_40000.parquet 0.0853 6ae704d8 1.8208 30bbf3ea ``` The benchmark essentially runs ```python result = main( date_str=None, input_data=InputData.df_and_mapper( df=data, mapper=MAPPER, ), main_targets=[MainTarget.processed_data], tt_targets=TTTargets(tree=TT_TARGETS), backend=backend, ) ``` on the targets defined in `interface_playground.ipynb` with differently sized datasets that replicate the example household from the same notebook `N` times (i.e., `N*3` persons in each dataset). The hashes demonstrate that this PR creates `result` objects that are identical to the ones created with the $O(n^2)$ approach. To reproduce the benchmarks: - Run `make_data.py` (see attached .zip) to create example datasets - Run `benchmark_comparison.py` to create tables above [benchmark.zip](https://github.com/user-attachments/files/21327575/benchmark.zip) --------- Co-authored-by: Hans-Martin von Gaudecker <[email protected]> Co-authored-by: mj023 <[email protected]>

hmgaudecker changed the base branch from main to collect-components-of-namespaces April 7, 2025 09:57

hmgaudecker force-pushed the gep-07 branch from 82d7f58 to 979299e Compare April 7, 2025 10:01

MImmesberger and others added 27 commits May 12, 2025 17:35

UV.

f674987

Unterhalt.

d0b5e34

Merge branch 'rename-gettsim-params-fix-yaml-validation' of https://g…

d3423e9

…ithub.com/iza-institute-of-labor-economics/gettsim into rename-gettsim-params-fix-yaml-validation

Typos.

f394912

Add parameters for aRW calculation back in.

677e8df

Fix reference.

e4986e4

Make unit and reference period required (#904)

dbb956a

Title says it all. Better be explicit in the structure and allow for nulls than leaving things out accidentally. --------- Co-authored-by: Marvin Immesberger <[email protected]>

Went through changes, fixed inconsistencies and typos.

449873f

Merge branch 'rename-gettsim-params-fix-yaml-validation' of github.co…

9e1fec0

…m:iza-institute-of-labor-economics/gettsim into rename-gettsim-params-fix-yaml-validation

Remaining files.

9d8ae7a

Remove 'scalar' as a possible key in the params files, use 'value' in…

a881e94

…stead.

Abgeltungssteuer.

91615e0

Merge branch 'collect-unify-parsing-of-params' into move-gettsim-para…

5be2f4e

…ms-files

Add a few safety checks and modifications to behavior.

a131227

Kinderfreibetrag.

355d4d0

Kindergeld.

76a06e6

A bit of Einkommensteuer / Abzüge. Not working, but switching machine…

930ba79

…s now.

Make tests pass by adding a somewhat ad-hoc Evaluationsjahr.

3715950

Moved on with Abzügen von Einkünften/Einnahmen.

c7a077b

Altersfreibetrag.

7197b6e

Alleinerziehendenfreibetrag.

20eebcb

Behindertenpauschbetrag.

76ee21b

Finish converting eink_st_abzuege.yaml.

4af3204

Simplify calculation of Lohnsteuer / Vorsorgeaufwendungen.

734720a

Be explicit in name.

4a13dbd

Einkommensteuer parameters.

2dbdc72

MImmesberger and others added 4 commits July 8, 2025 19:59

Update notebook.

42c3695

Merge branch 'collect-components-of-namespaces' into gep-07

6ffeac9

Move and rename example used in GEP 7.

f4734e1

Update the playground notebook.

23c218d

MImmesberger added this to the v1.0 milestone Jul 9, 2025

ChristianZimpelmann reviewed Jul 9, 2025

View reviewed changes

Incorporate review suggestions.

57f39f3

hmgaudecker commented Jul 9, 2025

View reviewed changes

docs/geps/gep-07.md Show resolved Hide resolved

docs/geps/gep-07.md Show resolved Hide resolved

docs/geps/gep-07.md Outdated Show resolved Hide resolved

interface-playground.ipynb Outdated Show resolved Hide resolved

This was referenced Jul 9, 2025

ENH: Better error message for missing input data #1005

Closed

ENH: ("fail_if", "input_data_are_missing") #1006

Closed

Do not make a copy of the policy environment in the example.

074ea18

hmgaudecker mentioned this pull request Jul 14, 2025

ENH: Better handling of evaluation_date (etc.) #1019

Closed

hmgaudecker and others added 14 commits July 15, 2025 14:08

Merge branch 'collect-components-of-namespaces' into gep-07

ed11a86

Update GEP 7 with new model for policy/evaluation dates.

cdcdb2c

Use markdown tables instead of html.

90a9487

Merge branch 'collect-components-of-namespaces' into gep-07

985b6fc

Modify examples so that they should work after #1026.

bc3de05

Merge branch 'collect-components-of-namespaces' into gep-07

022bec9

Add new interface dag html.

3525917

Merge branch 'collect-components-of-namespaces' into gep-07

3eada4c

Merge branch 'collect-components-of-namespaces' into gep-07

98e133e

Merge branch 'ocollect-components-of-namespaces' into gep-07

adf3b56

Add dates and link to resolution.

18c403c

Add pixi task to build docs.

4a93e5b

Move interface playground into sandbox directory.

6d12d31

hmgaudecker merged commit 5fe956f into collect-components-of-namespaces Jul 23, 2025
14 of 15 checks passed

hmgaudecker deleted the gep-07 branch July 23, 2025 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GEP 7 and updates to GEPs 1-5 necessitated by GEP 6 #855

GEP 7 and updates to GEPs 1-5 necessitated by GEP 6 #855

hmgaudecker commented Apr 7, 2025 •

edited

Loading

Uh oh!

codecov bot commented Apr 7, 2025 •

edited

Loading

Uh oh!

ChristianZimpelmann left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChristianZimpelmann Jul 9, 2025

Uh oh!

hmgaudecker Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hmgaudecker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChristianZimpelmann commented Jul 9, 2025

Uh oh!

hmgaudecker commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

GEP 7 and updates to GEPs 1-5 necessitated by GEP 6 #855

GEP 7 and updates to GEPs 1-5 necessitated by GEP 6 #855

Conversation

hmgaudecker commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem do you want to solve?

Uh oh!

codecov bot commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChristianZimpelmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChristianZimpelmann Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

hmgaudecker Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hmgaudecker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChristianZimpelmann commented Jul 9, 2025

Uh oh!

hmgaudecker commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

hmgaudecker commented Apr 7, 2025 •

edited

Loading

codecov bot commented Apr 7, 2025 •

edited

Loading