Skip to content

Conversation

@bosonie
Copy link
Collaborator

@bosonie bosonie commented Oct 29, 2021

Fixes #241

@bosonie
Copy link
Collaborator Author

bosonie commented Oct 29, 2021

I believe that tests are not necessary since this feature is tested already when the various child classes of InputGenerator are tested. The relax workchain testes already inputs that are not nodes and unstored nodes. The bands workchain tests will test stored nodes.

Copy link
Collaborator

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we actually do need tests here. What you describe being tested is that things run, but not that nodes are not accidentally cloned, which is actually happening in this implementation. You are only checking the top level namespace for nodes, but they could be nested deeply within. Those will still get deepcopied. So to do this properly, you need to make it recursive. But you need to do it in a backtracking manner. You will need to go all the way down the namespaces to find where the nodes are and clone deepcopy the dictionaries by hand and the mutable values within it, except for the nodes which you keep.

@bosonie
Copy link
Collaborator Author

bosonie commented Oct 30, 2021

@sphuber I fixed the issue and added tests. The tests can be improved after we merge also the feature on bands. In that PR (#242) the InputGenerator is changed to not depend on ProtocolRegistry.
Ready for approval.

@bosonie bosonie requested a review from sphuber November 2, 2021 09:09
Copy link
Collaborator

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bosonie . Implementation looks good. Not sure if we actually need to distinguish between stored and unstored nodes, but can't hurt for now. Anyway, they would be stored before the get_builder is called when going through the compound workflows. Just have some changes in naming and general cleaning up, but then should be good to go.

Co-authored-by: Sebastiaan Huber <mail@sphuber.net>
@bosonie
Copy link
Collaborator Author

bosonie commented Nov 5, 2021

Not sure if we actually need to distinguish between stored and unstored nodes, but can't hurt for now. Anyway, they would be stored before the get_builder is called when going through the compound workflows.

This is a valid point. In other words, it does not make sense to distinguish stored and unstored nodes since it can always be that the node is stored before going through the get_builder (for instance if the get_builder call is inside a higher level workchain exposing the node as input). But this also means that we should remember all the developers that they should not modify ever a node inside the get_builder. It should be already clear but better remember that. However cloning or deepcopying internally is allowed even though the connection among the original node and the cloned one is lost since it is not in a calcfunction.

The problem of deepcopying the inputs of an input generator does not affect the Data
nodes only. Also other nodes like `ProcessNode` should not deepcopied
(actually in that case the code will crash trying to deepcopy).
So the deepcopy is prevented for all `Node`s.
Also eliminated the distintion betweeen stored and unstored nodes.
An implementation can not know in advance if the passed Data node is
stored or not. Therefore every implementation should not modify
nodes at all! No need to deepcopy nodes that are not stored!
@bosonie
Copy link
Collaborator Author

bosonie commented Nov 5, 2021

@sphuber I finished to implement your modifications. Please note that:

  1. I extended the imposition to NOT deepcopy to all Nodes, no metter they are stored or not.
  2. I modified the tests to test that a Node is not deepcopied and a dict is deepcopied. This is done adding that the dict is modified inside get_builder, but the modification has no effect in the passed kwarg. For the Node only the uuid is checked. Is this enough?

Copy link
Collaborator

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bosonie , just few remaining things

def test_get_builder_mutable_kwargs(generate_input_generator_cls):
"""
Test that calling ``get_builder`` does mutate the ``kwargs`` when they are not nodes.
In this case we want to deepcopy the ``kwargs`` before entering the ``get_builder``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is "we" in this sentence? The person calling get_builder, or the get_builder implementation? I think you mean the latter, because it is get_builder who wants to perform a deepcopy, but than "before entering the get_builder" doesn't make sense.

Comment on lines +47 to +49
kwargs = {'mutable': {'test': 333}}
generator.get_builder(**kwargs)
assert kwargs['mutable'] == {'test': 333}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand why this test was actually testing something. It relies on the _construct_builder implementation of the class that is returned by the generate_input_generator_cls factory. This is not at all clear and the docstring also doesn't say this. For this purpose, I would really advice to have the implementation of _construct_builder locally (either in this test file or better even in the test itself). You don't have to copy the entire class and implement it again, you can just monkeypatch the relevant method. I would suggest doing the following:

def test_get_builder_mutable_kwargs(generate_input_generator_cls, monkeypatch):
    
    def construct_builder(self, **kwargs):
        kwargs['mutable']['test'] = 'whatever'
        return self.process_class.get_builder()

    cls = generate_input_generator_cls(inputs_dict={'mutable': dict})
    monkeypatch.setattr(cls, '_construct_builder, construct_builder)
    # and then do the rest of the test.

This is a lot clearer and also less susceptible to accidentally breaking the test if the fixture generate_input_generator_cls removes the logic that changes the arbitrary keys in the kwargs that is not clear at all why it is there.

Comment on lines 94 to 95
if 'mutable' in kwargs:
kwargs['mutable']['test'] = 'whatever'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would remove this. It is specific to one test but really difficult to figure out which one. Should be locally implemented. See other comment

bosonie and others added 2 commits November 9, 2021 08:39
Just format of a comment

Co-authored-by: Sebastiaan Huber <mail@sphuber.net>
@bosonie bosonie requested a review from sphuber November 9, 2021 08:15
@bosonie
Copy link
Collaborator Author

bosonie commented Nov 9, 2021

@sphuber Thanks for the comments. I hope I clarified all the points.

@sphuber sphuber merged commit facf481 into master Nov 9, 2021
@sphuber sphuber deleted the fix/prevent_deepcopy_stored_nodes branch November 9, 2021 08:48
@sphuber
Copy link
Collaborator

sphuber commented Nov 9, 2021

Thanks @bosonie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Why copy.deepcopy the arguments of get_builder

3 participants