-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEP 1: Namespaces #14
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Lars Reimann <[email protected]>
What should the signature of Concatenate functions currently has the following arguments:
Based on the tests it seems that these new arguments are needed:
Questions:
Other considerations:
Problem 1: It's easy to accidentally pass e.g. Problem 2: The type of an argument like This tight coupling between the arguments is an indicator, that a class for the function registry (Possible Containers) and maybe another for a target specification would make sense. These could then have some static/class methods like The function registry would have the additional benefit, that it can immediately check for name clashes as functions get added. |
Thanks!
I would think so. Maybe we'll need to restrict the possible combinations of
No, but if
I think that might just be a better name than
We would keep the current set and add dictionaries with arbitrary nesting, I believe. Question is whether we allow a module as syntactic sugar to add all functions contained in there as candidates. In the example: {
"pensions": pensions
} instead of: {
"pensions": {
"eligible": pensions.eligible,
"benefit": pensions.benefit,
},
} But this might be opening a can of worms (e.g., what do you do with functions imported into
Exactly, see 2. I am not sure we want the
I am generally sympathetic to the idea of those classes, your argument makes sense to me. Only restriction would be that the user would never need to specify them but could just pass the Python built-ins we have so far. |
What I write below assumes that we want a very general solution (similar to pytrees in jax and estimagic) with arbitrary levels of nesting and almost arbitrary container types, including lists, dicts and mixes thereof. I am not sure, this is what you need and want for gettsim. Designing something at this generality in a way that every edge case makes sense is vey hard and took us months in estimagic despite the fact that we just needed minor extensions to what JAX does. I don't think we need @hmgaudecker In the test file you wanted me to look at I found just one commented out mention of
Enforce signature should not be a problem, defining what aggregator should do in the very general case is a hard design problem.
This is also a hard design problem as there seems to be some overlap, but the overlap is incomplete. E.g. there could be multiple flat return types (tuple, list, ...). I use this level of control in multiple applications.
No. targets should mirror functions, see examples above.
Almost anything but that does not actually make it difficult because of how
Targets should mirror functions, see examples above. I think the coupling problem does not really exist as it was based on the assumption that we need a I am against a class for What makes them so amazing from a developer perspective is that the small number of operations defined on them (tree_flatten, tree_unflatten, tree_map, ...) let you go very quickly from something extremely flexible to something that is very easy to work with. The constructors you suggest ( Regarding the issue in typing: I think Jaxtyping has found a nice way of dealing with this generality. Above I mention many open design problems. Those would all become easier if we are not talking about general pytrees with arbitrary nesting for |
Apologies, I guess my thinking was to pass a flat dict and to use the nesting structure based on double underscores nevertheless. However, it makes perfect sense to forget about this, I don't think anybody ever wants to do this and if they want, they should build in a call of |
Brief extension: I think the only reasonable behavior of aggregator is to call it on flattened outputs. Would be backwards compatible. So the main design problem left is how the current |
Thanks for all your comments, @janosg ! Yes we should stick as closely as possible to pytrees in Jax and we probably should support only I am not sure I understand why a two-level nested dict would be easier to handle than arbitrary levels? I would think that for dags in tree mode, we could restrict the registry to dictionaries and the leafs to functions? |
I think the hardest step in the implementation will be to extend short function names to function names that are unique in the global namespaces. Everything else should be easy to handle with pybaum functionality (even though sometimes a bit of creativity might be required). This step would be much easier for a dict with fixed nesting depth. It probably also has an elegant recursive solution, but (at least for me) it's usually not trivial to find these. |
You mean cases like: {
"a": {"b": {"c": d},
"a__b": {"c": d}
} ? But I am probably just missing something fundamental once more. |
Codecov Report
@@ Coverage Diff @@
## main #14 +/- ##
==========================================
- Coverage 96.46% 91.38% -5.08%
==========================================
Files 8 12 +4
Lines 453 511 +58
==========================================
+ Hits 437 467 +30
- Misses 16 44 +28
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
So, just spent some time on this, wrote a long post and lost it. Apologies. Bottom line is that I think I/we got carried away when working on this in January. We really should stick to the basics, a tree in dags can be a nested dictionary with leafs = functions. I also believe that we cannot determine a unique signature based on functions alone. The reason is that if we ask for a variable somewhere lower in the hierarchy than at the very top, we cannot know whether the user will pass it a variable at that level or somewhere further up. See test_trees.py for an example. Updating @lars-reimann's table above, I would think of the interface as follows:
Looking at it suggests to me that we'll want to functions as entry point to dags though:
Does that makes sense? I'll also be available in the afternoon for chatting. |
I'm all for that: It's easier to implement (less validation required) and use (less documentation to understand). |
No. Sorry, I should have said argument names instead of function names. From normal |
Okay, I think that is precisely the reason why I believe we'll need to include the (structure of) inputs in Minimal example: def b(c):
return c ** 2
{
{"a": b},
} Based on that and the envisaged structure, it is impossible to tell whether the inputs will be:
With deeper nesting, intermediate versions would of course also be possible. I'd think the best thing we could do is to provide a template generator with a keyword This would solve the uniqueness problem, too. In the global namespace, for the two cases above:
@janosg @lars-reimann : If you agree I'd continue a little with the DEP along the lines we discussed so that Lars can go ahead. Thanks! |
I thought this is what the input mode is for. If you have {"a__c": 0.5} I am not saying, that this would necessarily be user friendly. Probably most would find it confusing. But it is probably the only natural way to define something that makes sense for working with arbitrarily nested structures. According to JAX philosophy, there would probably be no input mode and only 2 would work as it is more in line with pytree thinking. |
My point is that in def b(c):
return c ** 2
def d(c):
return c ** 0.5
{
{"a": b},
d,
} So the two cases represent different behavior, not different ways to specify the inputs (we would have to specify Put differently, we cannot predict unique input names anymore in
|
Ok, I see the point. It might be a good idea to start over and think about solutions to the problems in gettsim (mainly long function/variable names) that do not have the side effect that the concatenated taxes and transfers function is much harder to use. |
What do you mean by harder to use? The only thing that is different is that I'll need to know the inputs ex ante (which should always be the case, you know your data). Other than that, if thinking about flat input mode, all that happens is that your input names become longer and more comprehensible. I am all on that side of the trade-off. Am I missing anything? |
(The deeper issue in GETTSIM is to get namespaces in there, not long names per se, though related ) |
It seems the problem is that we need to map a simple name to a qualified name. In your example we could map the simple name Option 1: Start at the level that the function is defined at and work your way up. Use the first matching function/input. Problems:
Option 2: Mimic Python's module system to some extent. This would mean that simple names can only directly be used to access functions/inputs at the same level as the function. For everything else we could require qualified names. We could also offer a way to import a function/input, i.e. make them available by a simple name, though that seems rarely worth it. Problem:
|
I think getting namespaces into gettsim is not necessarily the same as getting namespaces into dags by allowing nested specifications of functions. This would possibly go more in the direction of a DSL then, but it could basically try to leverage how modules and namespaces are handled in Python instead of implementing a new way of handling this stuff via nested dictionaries. Any solution on the
But probably @lars-reimann is in a much better position to think about how such an alternative solution could look like. |
@lars-reimann saw it just now: I like the idea to mimic python's module system and I think it goes in the same direction of what I suggested above. |
Great discussion!!! Option 2 also sounds good to me. Maybe at deeper levels of the hierarchy, we could require something like an absolute path? Say This said, I still do not see a large downside to requiring knowledge of the input structure upfront, not in applications of GETTSIM and not in anything else I can think of. Could you elaborate a little why you are worried about this? |
I agree that this is not really a problem. |
I just updated the example and DEP a little bit, largely within Option 1 still. If we found a good way to use absolute paths / imports (aka qualified names, I guess) I'd be fine with Option 2. I might be stuck in my own thinking too much. It would be great to see a prototype to play with as the next step, the limits of my imagination have been pretty much reached... |
No description provided.