-
Notifications
You must be signed in to change notification settings - Fork 80
Group YAML tests in a single OpenFisca run to speed up testing time #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For survey data we essentially use SurveySceanrios We should be able to find a way of:
|
Hi @benjello, when you talk about |
Yep @maukoquiroga |
So, idea would be then to sort of parametrise or vectorise test execution? I know we can't do with An alternative option, if |
|
I don't think there is anything about numpy that prevents us from doing that. We just need to be able make a single simulation from 10/100/all tests, run it, and compare the results with the expected outcomes.
|
I created a branch, |
Hmm, thinking more about that, I think there is actually some hard limitation to run several tests in a single run. Let's consider a tax and benefit systems with 3 variables Let's consider the following tests: - name: "I - Check A formula"
input_variables:
B:
2018-07: 10
output_variables:
A:
2018-07: 20
- name: "II - Check B formula"
input_variables:
C:
2018-07: 10
output_variables:
B:
2018-07: 20
- name: "III - End to end test"
input_variables:
C:
2018-07: 10
output_variables:
A:
2018-07: 40 First, let's try to merge tests Conclusion: If a variable is both an input of a test and an output of another test, then these tests can't run at the same time. More subtle, let's consider merging Conclusion: If a variable By "intermediate dependency" I mean a dependency that has a formula. To be more precise, we need to take periods into account, and rephrase our conclusions:
I may be missing something, but I'd say that reciprocally, if we don't fall in these two categories, our tests can be safely merged. |
Now, having said that, is there still a chance to manage to run several tests at the same time? It seems hard. One way would be to have the test runner be very smart and only group tests that are "mergeable". But the condition is practically hard to check (we would need to parse the variable-period graph). Another would be to tweak the core of OpenFisca to overcome the issues stated above. The idea would be to enable "partially setting" an input variable. When requesting a partially set variable, the variable would be calculated, and the result merged with the input. That seems theoretically feasible, it would probably fix #564 as well, but that'd be complex and very experimental 🤓. |
If the initial need is to speed up testing I'm pretty sure focusing efforts on speeding up startup and computation times would yield good results with more positive side effects considering the complexity of the problems that are being thought about here :)
… Le 18 juil. 2018 à 06:41, Florian Pagnoux ***@***.***> a écrit :
Now, having said that, is there still a chance to manage to run several tests at the same time?
It seems hard.
One way would be to have the test runner be very smart and only group tests that are "mergeable". But the condition is practically hard to check (we would need to parse the variable-period graph).
Another would be to tweak the core of OpenFisca to overcome the issues stated above. The idea would be to enable "partially setting" an input variable. When requesting a partially set variable, the variable would be calculated, and the result merged with the input. That seems theoretically feasible, it would probably fix #564 as well, but that'd be complex and very experimental 🤓.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I agree with the conclusion, but the potential performance gain of test vectorization (x5 speed at least) will be hard to get by just "speeding up startup and computation" :) |
Agree ! (there's #535 )
Also agree, with a disclaimer: I've done some profiling an experimentation to find ways to optimise computation. For now I've concluded that:
In the lack of function pattern matching, the most promising refactoring effort in terms of performance lies IMHO in the proper vectorisation of core's code (get rid of every single if and for loop). The results of my performance forays can be found here openfisca/openfisca-france#1014 (comment) (maybe I'm missing something evident). |
Not sure I understand. @benjello has always made sure no nun-vectorial code slipped through 😉. There is probably some at simulation initialization, but not for calculations.
Nice! I didn't see this one. Will try to have a look :).
Have we tried this? |
Yeah, I've tried turning: def calc(self, base, factor = 1, round_base_decimals = None):
base1 = np.tile(base, (len(self.thresholds), 1)).T
if isinstance(factor, (float, int)):
factor = np.ones(len(base)) * factor
# np.finfo(np.float).eps is used to avoid np.nan = 0 * np.inf creation
thresholds1 = np.outer(factor + np.finfo(np.float).eps, np.array(self.thresholds + [np.inf]))
if round_base_decimals is not None:
thresholds1 = np.round(thresholds1, round_base_decimals)
a = max_(min_(base1, thresholds1[:, 1:]) - thresholds1[:, :-1], 0)
if round_base_decimals is None:
return np.dot(self.rates, a.T)
else:
r = np.tile(self.rates, (len(base), 1))
b = np.round(a, round_base_decimals)
return np.round(r * b, round_base_decimals).sum(axis = 1) into: @numba.jit(nopython=False)
def _tile(rates, shape):
return np.tile(rates, shape)
@numba.jit(nopython=True)
def _T(tile):
return tile.T
@numba.jit(nopython=False)
def _threshold1a(thresholds, inf=[np.inf]):
return np.array(thresholds + inf)
@numba.jit(nopython=True)
def _threshold1b(factor, thresholds, eps=np.finfo(np.float).eps):
return np.outer(factor + eps, thresholds)
@numba.jit(nopython=False)
def _round(thresholds, round_base_decimals):
return np.round(thresholds, round_base_decimals)
@numba.jit(nopython=True)
def _max_min(base, thresholds):
return np.maximum(np.minimum(base, thresholds[:, 1:]) - thresholds[:, :-1], 0)
@numba.jit(nopython=True)
def _shape(base):
return (len(base), 1)
@numba.jit(nopython=True)
def _multiply(a, b):
return a * b
@numba.jit(nopython=True)
def _sum(rounded):
return rounded.sum(axis=1)
@profile
def calc(self, base, factor = 1, round_base_decimals = None):
base1 = _T(_tile(base, _shape(self.thresholds)))
if isinstance(factor, (float, int)):
factor = np.ones(len(base)) * factor
# np.finfo(np.float).eps is used to avoid np.nan = 0 * np.inf creation
thresholds1 = _threshold1b(factor, _threshold1a(self.thresholds))
if round_base_decimals is not None:
thresholds1 = _round(thresholds1, round_base_decimals)
a = _max_min(base1, thresholds1)
if round_base_decimals is None:
return np.dot(self.rates, a.T)
else:
r = _tile(self.rates, _shape(base))
b = _round(a, round_base_decimals)
return _sum(_round(_multiply(r, b), round_base_decimals)) We cannot apply JIT optimisation with Note: Today, calc's signature is as follows: def calc(self,
base: ndarray,
factor: Union[int, ndarray, float] = 1,
round_base_decimals: Optional[int] = None) -> ndarray To vectorise it, we need before to strongly type the function signature: def calc(self,
base: ndarray,
factor: float = 1.0,
round_base_decimals: int = 0) -> ndarray Then, we can rewrite for example: if isinstance(factor, (float, int)):
factor = np.ones(len(base)) * factor into: factors = np.ones(base.shape[0]) * factor Which is C-optimisable. And so on... |
Closing this, as an exploration that didn't pan out of a particular solution (group several tests into a single vector computation) to an underlying problem (France tests are slow). The problem may still exist but discussion on this particular issue is too anchored on the proposed solution. |
Uh oh!
There was an error while loading. Please reload this page.
Relates to openfisca/openfisca-france#926
Hi,
Thanks a lot for that piece of software.
Currently, tests on large country package are relatively slow.
I have a gut feeling, YAML tests could be merged into a single OpenFisca run.
I haven't created a proof of concept but that would be interesting to give it a try.
In my experience, I rarely run tests on OpenFisca-France because they are too slow.
This isssue is related to #535
I identify more as a:
Developer (I create tools that use the existing OpenFisca code).
The text was updated successfully, but these errors were encountered: