Skip to content

[RFC] Provide an explicit Cache API #891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

[RFC] Provide an explicit Cache API #891

wants to merge 6 commits into from

Conversation

fpagnoux
Copy link
Member

@fpagnoux fpagnoux commented Jun 21, 2019

Connected to #887

PR Content

This PR is WIP as it only defines some methods contracts rather than implementing them. Only one method is pseudo-implemented, to better explicit the separation of tasks.

Some unit tests are also added, to explicit the behavior of the introduces methods.

The intent is to reach a decision on the big picture before getting drowned into implementation details.

Approach

The suggestion here is to:

  • Dismantle and remove the Holder class
    - Caching calculated variables would be moved to a new class Cache
    - The rest of the logic would be moved to other existing classes
  • Make simulation.set_input the user-facing method to set and input for a variable at a given time

from openfisca_core.periods import Period


class Cache:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class interface looks very similar to the current InMemoryStorage. Maybe these classes could be merged ?

class Cache:


def get(self, variable: str, period: Period) -> Optional[Array]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought about names longer like get_cached_array or put_in_cache, but as this class is just doing cache, it seemed redondant.

if period is not None:
period = periods.period(period)

if variable.is_inactive(period):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the check on variable.end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid negation in method name and use is_active(period) instead?

return
self.get_holder(variable_name).set_input(period, value)
array = variable.cast_to_array(value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the _to_array method of Holder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten the name to to_array() and add some documentation to the method as the user might wonder why an openfisca variable is involved?

self.get_holder(variable_name).set_input(period, value)
array = variable.cast_to_array(value)

if len(array) != population.count:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted to abstract this check into a method somewhere else, but the error message is pretty specific to the set_input context, and needs access both the variable and population.

)

if variable.set_input:
return variable.set_input(self, variable, period, array)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set_input hooks need access to both simulation and variable. It could for sure be done better, but I'd rather do the minimum adaptation and keep further improvement out of scope as there have been recent work on that by @Morendil.

if variable.set_input:
return variable.set_input(self, variable, period, array)

variable.check_input_period(period)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the period checked that were spread on holder.set_input and holder._set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no mention of the input in this method, it seems more general. Rename to check_period/is_valid_period/check_period_matching... and adapt documentation+log message accordingly?



@fixture
def cache():
Copy link
Member Author

@fpagnoux fpagnoux Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methodology question for @Morendil :

Let's assume that Cache uses, in its implementation, a bunch of InMemoryStorage instances to store data. This is Cache internal business, and is not reflected in the public methods.

To write a proper unit test on Cache, do I need to stub/mock InMemoryStorage, or can I just write tests on the Cache methods, as below?

In other words, can a unit test indirectly run some code that is outside of the class under test, or does it make it an integration test ?

Thanks a lot in advance!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try not to get too hung up on unit/integration test differences, it's a tactical matter rather than one of principle. It's fine if the methods of the object under test rely on some other collaborators, as long as you're primarily testing that object's behaviour, rather than indirectly testing the collaborators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks a lot for the explanation 🙂

"""
pass

def get_known_periods(self, variable: str) -> List[Period]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this method needs to be part of the public contract. Who needs it? Under what circumstances?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the exact circumstances, but it is definitely used in several places in IPP code, when they want to figure out all what's in the cache.

Copy link
Collaborator

@sandcha sandcha Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is sometimes updated to init a calculation in a reform.
We might need a clear interface for reforms and, in the meantime, use get_known_periods, delete...

if period is not None:
period = periods.period(period)

if variable.is_inactive(period):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid negation in method name and use is_active(period) instead?

return
self.get_holder(variable_name).set_input(period, value)
array = variable.cast_to_array(value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten the name to to_array() and add some documentation to the method as the user might wonder why an openfisca variable is involved?

if variable.set_input:
return variable.set_input(self, variable, period, array)

variable.check_input_period(period)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no mention of the input in this method, it seems more general. Rename to check_period/is_valid_period/check_period_matching... and adapt documentation+log message accordingly?



def test_does_not_retrieve(cache):
value = cache.get_cached_array('toto', period)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value = cache.get_cached_array('toto', period)
value = cache.get('toto', period)

return variable.set_input(self, variable, period, array)

variable.check_input_period(period)
self.cache.put_in_cache(variable.name, period, array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.cache.put_in_cache(variable.name, period, array)
self.cache.put(variable.name, period, array)

@bonjourmauko bonjourmauko added the kind:discovery Issue requires discovery: value, ux and tech label Jul 23, 2019
@bonjourmauko bonjourmauko changed the title [WIP] Provide an explicit Cache API Provide an explicit Cache API Dec 10, 2019
@bonjourmauko bonjourmauko self-assigned this Jan 27, 2020
@bonjourmauko bonjourmauko changed the title Provide an explicit Cache API [WIP] Provide an explicit Cache API Apr 1, 2021
@bonjourmauko bonjourmauko added the policy:rfc Request For Comments: chime in! label Apr 1, 2021
@bonjourmauko bonjourmauko changed the title [WIP] Provide an explicit Cache API [WIP/RFC] Provide an explicit Cache API Apr 1, 2021
@bonjourmauko
Copy link
Member

Thanks again @fpagnoux for this contribution, which inspired greatly a proposal of my own on #947. I'm closing as this proposal stales a bit, although the issue itself remains open as, at some point, I agree we need to get rid of holders 😃

@bonjourmauko bonjourmauko changed the title [WIP/RFC] Provide an explicit Cache API [RFC] Provide an explicit Cache API Apr 10, 2021
@bonjourmauko bonjourmauko deleted the new-api-cache branch December 6, 2022 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:discovery Issue requires discovery: value, ux and tech policy:rfc Request For Comments: chime in!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants