Skip to content

[RFC] Provide an explicit Cache API #891

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions openfisca_core/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# -*- coding: utf-8 -*-

from typing import Optional, List, Dict

from openfisca_core.types import Array
from openfisca_core.periods import Period


class Cache:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class interface looks very similar to the current InMemoryStorage. Maybe these classes could be merged ?



def get(self, variable: str, period: Period) -> Optional[Array]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought about names longer like get_cached_array or put_in_cache, but as this class is just doing cache, it seemed redondant.

"""
Get the cached value of ``variable`` for the given period.

If the value is not known, return ``None``.
"""
pass

def put(self, variable: str, period: Period, value: Array) -> None:
"""
Store ``value`` in cache for ``variable`` at period ``period``.
"""
pass

def delete(self, variable: str, period: Optional[Period] = None) -> None:
"""
If ``period`` is ``None``, remove all known values of the variable.

If ``period`` is not ``None``, only remove all values for any period included in period (e.g. if period is "2017", values for "2017-01", "2017-07", etc. would be removed)
"""
pass

def get_known_periods(self, variable: str) -> List[Period]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this method needs to be part of the public contract. Who needs it? Under what circumstances?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the exact circumstances, but it is definitely used in several places in IPP code, when they want to figure out all what's in the cache.

Copy link
Collaborator

@sandcha sandcha Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is sometimes updated to init a calculation in a reform.
We might need a clear interface for reforms and, in the meantime, use get_known_periods, delete...

"""
Get the list of periods the ``variable`` value is known for.
"""
pass

def get_memory_usage(self) -> Dict[str, Dict]:
"""
Get data about the virtual memory usage of the cache.
"""
pass
26 changes: 22 additions & 4 deletions openfisca_core/simulations.py
Original file line number Diff line number Diff line change
Expand Up @@ -426,11 +426,29 @@ def set_input(self, variable_name, period, value):

If a ``set_input`` property has been set for the variable, this method may accept inputs for periods not matching the ``definition_period`` of the variable. To read more about this, check the `documentation <https://openfisca.org/doc/coding-the-legislation/35_periods.html#automatically-process-variable-inputs-defined-for-periods-not-matching-the-definitionperiod>`_.
"""
variable = self.tax_benefit_system.get_variable(variable_name, check_existence = True)
period = periods.period(period)
if ((variable.end is not None) and (period.start.date > variable.end)):
variable = self.get_variable(variable_name)
population = self.get_variable_population(variable_name)

# If the variable is constant over time, we ignore the period parameter and set it for ETERNITY
if variable.definition_period == periods.ETERNITY:
period = periods.ETERNITY_PERIOD
if period is not None:
period = periods.period(period)

if variable.is_inactive(period):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the check on variable.end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid negation in method name and use is_active(period) instead?

return
self.get_holder(variable_name).set_input(period, value)
array = variable.cast_to_array(value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the _to_array method of Holder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten the name to to_array() and add some documentation to the method as the user might wonder why an openfisca variable is involved?


if len(array) != population.count:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted to abstract this check into a method somewhere else, but the error message is pretty specific to the set_input context, and needs access both the variable and population.

raise ValueError(
f'Unable to set value "{value}" for variable "{variable.name}", as its length is {len(array)} while there are {population.count} {population.entity.plural} in the simulation.'
)

if variable.set_input:
return variable.set_input(self, variable, period, array)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set_input hooks need access to both simulation and variable. It could for sure be done better, but I'd rather do the minimum adaptation and keep further improvement out of scope as there have been recent work on that by @Morendil.


variable.check_input_period(period)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the period checked that were spread on holder.set_input and holder._set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no mention of the input in this method, it seems more general. Rename to check_period/is_valid_period/check_period_matching... and adapt documentation+log message accordingly?

self.cache.put_in_cache(variable.name, period, array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.cache.put_in_cache(variable.name, period, array)
self.cache.put(variable.name, period, array)


def get_variable_population(self, variable_name):
variable = self.tax_benefit_system.get_variable(variable_name, check_existence = True)
Expand Down
19 changes: 19 additions & 0 deletions openfisca_core/variables.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,25 @@ def is_input_variable(self):
"""
return len(self.formulas) == 0

def is_inactive(self, period: Period) -> bool:
"""
Return ``False`` is ``period`` is outside of the definition timeframe of the variable
"""
pass

def check_input_period(self, period: Period) -> None:
"""
Check that an input can be set for the variable at period ``period``.
Raise a ``PeriodMismatchError`` in case of period mismatch
"""
pass

def cast_to_array(self, value) -> Array:
"""
Try to convert ``value`` into a Numpy Array of the appropriate type for the variable
"""
pass

@classmethod
def get_introspection_data(cls, tax_benefit_system):
"""
Expand Down
40 changes: 40 additions & 0 deletions tests/core/test_u_cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from pytest import fixture
from numpy import asarray as array
from numpy.testing import assert_equal

from openfisca_core.cache import Cache
from openfisca_core import periods

period = periods.period(2018)


@fixture
def cache():
Copy link
Member Author

@fpagnoux fpagnoux Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methodology question for @Morendil :

Let's assume that Cache uses, in its implementation, a bunch of InMemoryStorage instances to store data. This is Cache internal business, and is not reflected in the public methods.

To write a proper unit test on Cache, do I need to stub/mock InMemoryStorage, or can I just write tests on the Cache methods, as below?

In other words, can a unit test indirectly run some code that is outside of the class under test, or does it make it an integration test ?

Thanks a lot in advance!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try not to get too hung up on unit/integration test differences, it's a tactical matter rather than one of principle. It's fine if the methods of the object under test rely on some other collaborators, as long as you're primarily testing that object's behaviour, rather than indirectly testing the collaborators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks a lot for the explanation 🙂

return Cache()


def test_does_not_retrieve(cache):
value = cache.get_cached_array('toto', period)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
value = cache.get_cached_array('toto', period)
value = cache.get('toto', period)

assert value is None


def test_retrieve(cache):
value_to_cache = array([10, 20])
cache.put_in_cache('toto', period, value_to_cache)
value_from_cache = cache.get_cached_array('toto', period)
assert_equal(value_to_cache, value_from_cache)


def test_delete(cache):
value_to_cache = array([10, 20])
cache.put_in_cache('toto', period, value_to_cache)
cache.delete_arrays('toto', period)
value_from_cache = cache.get_cached_array('toto', period)
assert value_from_cache is None


def test_retrieve_eternity(cache):
value_to_cache = array([10, 20])
cache.put_in_cache('toto', periods.ETERNITY_PERIOD, value_to_cache)
value_from_cache = cache.get_cached_array('toto', period)
assert_equal(value_to_cache, value_from_cache)
72 changes: 72 additions & 0 deletions tests/core/test_u_variable.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import datetime

import pytest
import numpy as np
from numpy import asarray as array
from numpy.testing import assert_equal

from openfisca_core.variables import Variable
from openfisca_core import periods
from openfisca_core.periods import period as make_period
from openfisca_core.errors import PeriodMismatchError


class VariableStub(Variable):

def __init__(self):
pass


def test_is_inactive():
variable = VariableStub()
variable.end = datetime.date(2018, 5, 31)

assert variable.is_inactive(make_period('2018-06'))
assert not variable.is_inactive(make_period('2018-05'))


@pytest.mark.parametrize("definition_period, period", [
(periods.MONTH, '2019-01'),
(periods.YEAR, '2019'),
(periods.ETERNITY, '2019-01'),
(periods.ETERNITY, '2019'),
(periods.ETERNITY, periods.ETERNITY),
])
def test_check_input_period_ok(definition_period, period):
variable = VariableStub()
variable.definition_period = definition_period
variable.check_input_period(make_period(period))


@pytest.mark.parametrize("definition_period, period", [
(periods.MONTH, periods.ETERNITY),
(periods.MONTH, 2018),
(periods.MONTH, "month:2018-01:3"),
(periods.YEAR, periods.ETERNITY),
(periods.YEAR, "2018-01"),
(periods.YEAR, "year:2018:2"),
])
def test_period_mismatch(definition_period, period):
variable = VariableStub()
variable.definition_period = definition_period
variable.name = 'salary'

with pytest.raises(PeriodMismatchError):
variable.check_input_period(make_period(period))


@pytest.mark.parametrize("input_value, expected_array", [
(array([5]), array([5])),
(array(5), array([5])),
([5], array([5])),
(5, array([5])),
('3 + 2', array([5])),
])
def test_cast_to_float_array(input_value, expected_array):
variable = VariableStub()
variable.value_type = float
variable.dtype = np.float32
value = variable.cast_to_array(input_value)
assert isinstance(value, np.ndarray)
assert value.ndim == 1
assert_equal(value, expected_array)