Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespaces: Infrastructure #780

Open
wants to merge 100 commits into
base: collect-components-of-namespaces
Choose a base branch
from

Conversation

lars-reimann
Copy link
Collaborator

@lars-reimann lars-reimann commented Jul 17, 2024

What problem do you want to solve?

This PR introduces namespaces to GETTSIM's infrastructure.

@lars-reimann
Copy link
Collaborator Author

lars-reimann commented Jul 17, 2024

@MImmesberger The nested function dictionary is currently structured like this:

image

Which levels of nesting should be removed? Based on our previous discussion, it's probably the first two, i.e. _gettsim and social_insurance_contributions/transfers/taxes/demographic_vars, right?

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@31bf89f). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #780   +/-   ##
=======================================
  Coverage        ?   87.82%           
=======================================
  Files           ?       56           
  Lines           ?     3975           
  Branches        ?        0           
=======================================
  Hits            ?     3491           
  Misses          ?      484           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@MImmesberger
Copy link
Collaborator

I think that, for example, transfers.arbeitsl_geld.betrag is perfect, so I would vote for keeping the social_insurance_contributions/transfers/taxes/demographic_vars level. The _gettsim should not be part of the target call.

@hmgaudecker
Copy link
Collaborator

hmgaudecker commented Jul 17, 2024

Looking at that, I'd almost think that transfers and taxes are superfluous, but not demographics (I'd suggest renaming this) and social_insurance_contributions, although we might be able to come up with a shorter name.

What do you think, @MImmesberger ?

In any case, not super-important for the moment, the good thing is that it will be fairly easy to do bulk-renamings / removals (granted it seems more difficult to insert a level back than to remove it, so I'd be fine with an approach applying a bit more caution at the moment).

@MImmesberger
Copy link
Collaborator

I agree, there should be no overlap between the elements one level below the tax/transfer level, so no need to explicitly distinguish. In that case, however, we should make sure that the module docstring what the module is about (e.g. it's not obvious whether the "Kinderbonus" is a tax deduction or a transfer).

Regarding the naming of social_insurance_contributions: In the google doc, I called this "sozialversicherungsbeitraege". But I just realized that I put this in the taxes namespace which is wrong. Should be its own (as it is the case currently in Lars' proposal).

@hmgaudecker
Copy link
Collaborator

Agreed, let's use sozialversicherungsbeitraege. At some point we might rename demographics yet again, but that really should be a simple Search & Replace.

I'm all for good docstrings, but any tax deduction should go into the tax component it is deducted from, right?

@MImmesberger
Copy link
Collaborator

I'm all for good docstrings, but any tax deduction should go into the tax component it is deducted from, right?

Yes definitely. Was just thinking about making it as obvious as possible. In our case "Kinderbonus" is a transfer.

@ChristianZimpelmann
Copy link
Member

sozialversicherungsbeitraege

More in line with the naming of modules might be sozialv_beitraege

@hmgaudecker
Copy link
Collaborator

Good catch! Those are scheduled to be changed to

  • arbeitslosenversicherung
  • einkommensgrenzen
  • krankenversicherung
  • pflegeversicherung
  • rentenversicherung

But that was not impossible to know, ofc!

@lars-reimann
Copy link
Collaborator Author

lars-reimann commented Jul 17, 2024

Should the renamings/removals be done programmatically in the new function or by changing the directory structure under _gettsim?

Edit: For now, remove transfers/taxes in the new function. Later, change the directory structure, so it matches the dictionary.

@hmgaudecker
Copy link
Collaborator

Just came across #533 -- might that be fixed in passing here?

lars-reimann and others added 16 commits August 9, 2024 09:08
- Add testing dependencies to default environment
- Make sure the correct kaleido dependency is installed on Windows/Unix
- Add task `tests`, so that `pixi run tests` gives one the option to run
the tests on Python 3.11 and 3.12
- Set 3.12 as the upper bound for the default environment Python version
(as long as we don't test 3.13, we should probably not use it in the
development environment?).
- Use pdbp in pytest
- Remove artifacts from previous packaging workflow

---------

Co-authored-by: Hans-Martin von Gaudecker <[email protected]>
@MImmesberger MImmesberger changed the title Namespaces Namespaces: Infrastructure Dec 12, 2024
@hmgaudecker hmgaudecker mentioned this pull request Dec 12, 2024
@hmgaudecker hmgaudecker changed the base branch from main to collect-components-of-namespaces December 12, 2024 10:36
@hmgaudecker hmgaudecker marked this pull request as ready for review December 12, 2024 10:37
Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice -- but far too long for the amount of time I have today. Will try to continue asap.

Thanks a lot!

src/_gettsim/functions/loader.py Show resolved Hide resolved
src/_gettsim/functions/loader.py Outdated Show resolved Hide resolved
src/_gettsim/functions/loader.py Outdated Show resolved Hide resolved
src/_gettsim/gettsim_typing.py Outdated Show resolved Hide resolved
@MImmesberger
Copy link
Collaborator

So that we don't forget it: We should think about changing the name_in_dag keyword to something else. Maybe simple_name or simple_name_in_dag makes clearer that this isn't the qualified name.

@hmgaudecker
Copy link
Collaborator

So that we don't forget it: We should think about changing the name_in_dag keyword to something else. Maybe simple_name or simple_name_in_dag makes clearer that this isn't the qualified name.

Good one. But now that we switch, should not the full path be part of policy_info?

@hmgaudecker
Copy link
Collaborator

I looked a little deeper now. Let's set aside a day in January where we'll look at this more closely. I do believe there is huge room for simplification in the whole function loader with the tree structure. That is, policy_info can just use the path to the module if no path is supplied. Maybe the policy_info decorator can even become a policy_function decorator returning the whole function so that _create_policy_function_from_decorated_callable can disappear completely.

@@ -186,10 +218,12 @@ def _create_policy_function_from_decorated_callable(
"""

# Only needed until the directory structure is cleaned up
# TODO(@MImmesberger): Remove the removeprefix calls once the directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we should do that right now. We will not merge this one into main, so we want to have the full machinery in place before tackling the renaming. Or am I missing something? Same for the other occurrence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this a renaming step, but yes, we better fix it here already.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that was the right thought at the time we wanted to merge into main!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't settled on a specific directory structure yet. I propose:

|-- src
|     |-- gettsim
|     |       |-- ...
|     |-- _gettsim
|     |       |-- infrastructure (alternatively: ttsim?)
|     |       |        |-- aggregation_jax.py
|     |       |        |-- aggregation_numpy.py
|     |       |        |-- aggregation.py
|     |       |        |-- gettsim_typing.py
|     |       |        |-- piecewise_functiony.py
|     |       |        |-- policy_environment.py
|     |       |        |-- policy_environment_postprocessor.py
|     |       |        |-- interface.py
|     |       |        |-- shared.py
|     |       |        |-- time_conversion.py
|     |       |        |-- vectorization.py
|     |       |-- parameters
|     |       |-- taxes_and_transfers
|     |       |        |-- demographics
|     |       |        |-- einkommensteuer
|     |       |        |-- arbeitslosengeld_2
|     |       |        |-- wohngeld
|     |       |        |-- ...
|     |       |        |-- groupings.py
|     |       |-- config.py
|     |       |-- synthetic.py
|     |       |-- visualization.py
|     |-- _gettsim_tests
|     |       |-- ...

Copy link
Collaborator

@hmgaudecker hmgaudecker Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Again, I think I would be even more radical, now that we are at it:

|-- src
|     |-- gettsim
|     |       |-- ...
|     |-- _gettsim
|     |       |-- ttsim
|     |       |        |-- config.py
|     |       |        |-- aggregation_jax.py
|     |       |        |-- aggregation_numpy.py
|     |       |        |-- aggregation.py
|     |       |        |-- typing.py
|     |       |        |-- piecewise_functiony.py
|     |       |        |-- policy_environment.py
|     |       |        |-- policy_environment_postprocessor.py
|     |       |        |-- interface.py
|     |       |        |-- shared.py
|     |       |        |-- time_conversion.py
|     |       |        |-- vectorization.py
|     |       |        |-- visualize_dag.py
|     |       |-- de
|     |       |        |-- config.py
|     |       |        |-- parameters
|     |       |        |-- demographics
|     |       |        |-- einkommensteuer
|     |       |        |-- arbeitslosengeld_2
|     |       |        |-- wohngeld
|     |       |        |-- ...
|     |       |        |-- groupings.py
|     |       |        |-- synthetic.py
|     |-- _gettsim_tests
|     |       |-- ttsim
|     |       |        |-- ...
|     |       |-- de
|     |       |        |-- ...

That is,

  • ttsim suggestion is cool, change taxes_and_transfers accordingly
  • config and synthetic need to be split up among these two
  • just typing, not gettsim_typing
  • visualize_dag (we should probably rename the function, too?)
  • make parameters part of de

Probably easiest to inject this as step 3 in the grand PR given the work you put in so far?

Copy link
Collaborator

@hmgaudecker hmgaudecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small things while familiarising myself with it. Looks great in general!

return result


def _filter_tree_by_name_list(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you draw up an overview where we need

  1. qualified names
  2. the branch of the nested dicts?

Just to get an idea of that, hard to see by scanning the codebase.

Reason: I think I'd prefer working with 2. wherever possible (i.e., wherever we do not need to write it down as an argument to a Python function). IIUC, that would be optree.PyTreeAccessors?).

functions_not_overridden, dag.nodes
)
processed_functions = _round_and_partial_parameters_to_functions(
necessary_functions,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
necessary_functions,
_filter_tree_by_name_list(...)[1],

and remove the above line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants