Skip to content

fix: deepcopy variables in clone of tax_benefit_system #1319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 7, 2025

Conversation

sylvainipp
Copy link
Contributor

@sylvainipp sylvainipp commented Feb 7, 2025

Depends on #1321

Technical changes

  • Change the copy of the variables in tax_benefit_system.clone() to a deepcopy

@sylvainipp
Copy link
Contributor Author

I think this is mainly used in survey-manager, @clallemand do you think it is appropriate ? I need it when creating a reform based on inflate_parameters (an openfisca-survey-manager function), otherwise I have a problem with variables losing their end_date.

@benjello
Copy link
Member

benjello commented Feb 7, 2025

A simple copy creates a new object but references the same memory addresses for nested objects. Changes to nested objects in the copy will affect the original.

Maybe you changed the variable on the original (or the copy) and got side effects.

But I agree, I cannot see situation where we want attributes changes transmitted between the copy and the original of a tbs.

@clallemand
Copy link
Contributor

I have no objection !

@sylvainipp
Copy link
Contributor Author

Thanks for your feedback ! The idea is that it is still possible to make the change in variables before cloning, and after it is possible to change only one of them. I don't think this transmission of changes is intentionally used by anyone, so I think it can be a minor version change even if it is technically breaking.

setup.py Outdated
@@ -70,7 +70,7 @@

setup(
name="OpenFisca-Core",
version="43.2.7",
version="43.2.8",
Copy link
Member

@benjello benjello Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a minor change not a bugfix (semver).

@sylvainipp
Copy link
Contributor Author

Hi @bonjourmauko, there are some errors in the tests due to default value of arrays, do you think it is linked to your recent changes ? I can't figure why it appears here and not in the previous PR (maybe new dependencies ?), but as you worked recently on it I wondered if you had an easy fix.

@bonjourmauko
Copy link
Member

Hi @sylvainipp, my first impression is that there is a casting problem only afecting conda. As testing in conda was recently introduced, it may well be the case that the error is pre-existing.

@@ -117,7 +117,7 @@ def _int_to_index(

"""
indices = enum_class.indices
values = numpy.array(value, copy=False)
values = numpy.asarray(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has no link with the original modification, it was to fix one of the errors in the conda test (here : https://github.com/openfisca/openfisca-core/actions/runs/13202963909/job/36859430121)

@bonjourmauko
Copy link
Member

bonjourmauko commented Feb 13, 2025

The changes look ok. The copy=false is to reuse the memoryview when possible, to save on ram. The best way to test if copying has a noticeable impact on performance, is to run a simulation with a huge dataset with both.

(Be aware that there is no right answer, and it depends a lot on the specs of the machine runing it.)

For the test, I can see that it is wrong, because we should not mix DTypes within an array. This part looks like a hack:

choicelist = [array('first_parent', dtype='<U12'), array('second_parent', dtype='<U13'), array('child', dtype='<U5'), 0]
condlist   = [array([ True, False]), array([False,  True]), array([False, False])]

Have you tried replacing 0 with an str default value? Or with an array or either intxx or str?

I bet there is a platform-level (low level) casting done differently within conda, or something of the sort.

Or! Another option that I can think of, is that deepcopy is bypassing de/indexation of EnumArray, and that for that reason you're ending up with a mixed DType array.

In either case, we should never have as a result a mixed DType array.

Let me know how it goes :)

@sylvainipp
Copy link
Contributor Author

Thank you for the feedback ! The code lines definitely look like the source of the error that concerns the absence of common dtype, my problem is that I can't find those lines, even with a search in vscode. Could you please tell me where it is written so I can try to fix it ?

@bonjourmauko
Copy link
Member

tests/core/parameters_fancy_indexing/test_fancy_indexing.py:177
tests/core/test_dump_restore.py:19

@bonjourmauko
Copy link
Member

Looking at the message given by NumPy, it looks like we're on the right track about the cause:

TypeError: Choicelist and default value do not have a common dtype: The DType <class 'numpy.dtypes._PyLongDType'> could not be promoted by <class 'numpy.dtypes.StrDType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is object. The full list of DTypes is: (<class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes._PyLongDType'>)

@sylvainipp
Copy link
Contributor Author

Thanks ! I don't understand how it is possible but I see other lines at those place :
tests/core/parameters_fancy_indexing/test_fancy_indexing.py:177 : assert_near(P.single.owner[zone], [100, 200, 200, 100])
tests/core/test_dump_restore.py:19 simulation_dumper.dump_simulation(simulation, directory)
I'm sorry if I miss something obvious but I search this 0 that is not of the good dtype and I don't manage to find it.

@bonjourmauko
Copy link
Member

Tu as le stack trace ici https://github.com/openfisca/openfisca-core/actions/runs/13264258879/job/37027895878?pr=1319

Ce 0 vient de NumPy. À vue rapide, il semblerait que le de/encodage est bypassé au moment du dump/restore.

@bonjourmauko bonjourmauko changed the base branch from master to fix/conda-build-bad-numpy March 6, 2025 18:14
Base automatically changed from fix/conda-build-bad-numpy to master March 7, 2025 09:40
@sylvainipp sylvainipp force-pushed the deepcopy_variables_in_clone branch from c24f33d to f7963dc Compare March 7, 2025 09:51
@bonjourmauko bonjourmauko changed the title Fix: deepcopy variables in clone of tax_benefit_system fix: deepcopy variables in clone of tax_benefit_system Mar 7, 2025
@bonjourmauko bonjourmauko force-pushed the deepcopy_variables_in_clone branch from f7963dc to e7ca4da Compare March 7, 2025 11:17
@bonjourmauko bonjourmauko force-pushed the deepcopy_variables_in_clone branch from e7ca4da to faf3f5b Compare March 7, 2025 11:18
@bonjourmauko bonjourmauko merged commit 653b031 into master Mar 7, 2025
29 checks passed
@bonjourmauko bonjourmauko deleted the deepcopy_variables_in_clone branch March 7, 2025 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants