Skip to content

✨ NEW: Allow user to filter/skip certain submissions #4

@mbercx

Description

@mbercx

Use cases

Currently, there is no way to indicate that you want to skip certain extras/filter nodes from the parent_group when submitting work chains. Here's two examples of use cases for this feature:

  • For the 3DCD runs, we typically only run structures up to a certain system size (i.e. number of sites in the unit cell).
  • Imagine that the work chains you want to submit depend on the outputs of a previous work chain. In this case you most likely only want to run work chains that have finished with exit status 0.

Possible approaches

Using skip_extras

Initially, the solution I had in mind was to add a skip_extras input argument, representing a function that takes the extras and returns True or False depending on whether a certain set of extras should be run. This would be added first as an input argument to the .submit_new_batch() method, and passed to the get_all_extras_to_submit() method:

    def submit_new_batch(self, dry_run=False, sort=True, sleep=1, skip_extras=None):
        """Submit a new batch of calculations, ensuring less than self.max_concurrent active at the same time.
        
        :param dry_run: simply return the extras that would be submitted.
        :param sort: sort the work chains by the extras before submissions.
        :param skip_extras: function that returns True in case a set of extras should be skipped, False otherwise.
        """
        to_submit = []
        extras_to_run = set(self.get_all_extras_to_submit(skip_extras)).difference(self._check_submitted_extras())
[...]

In the FromGroupSubmissionController.get_all_extras_to_submit(), for example, the function would be used to filter out the extras that didn't pass the test:

        if skip_extras is not None:
            results = [tuple(_) for _ in qbuild.all() if not skip_extras(_)]
        else:
            results = [tuple(_) for _ in qbuild.all()]

This means we have to add the extras that are required for this filtering, of course. Typically you can use the ones that uniquely define the work chain though. The above implementation is flawed in the sense that you have to rely on the index of the extra you are interested in when implementing the skip_extras method. But this can probably be fixed.

Using filters

Another straightforward approach in the case of the FromGroupSubmissionController (where both use cases stem from) is to have a filters inputs that is applied to the query to obtain the extras to submit:

qbuild = orm.QueryBuilder()
qbuild.append(orm.Group,
filters={'id': self.parent_group.pk},
tag='group')
qbuild.append(orm.Node,
project=extras_projections,
tag='process',
with_group='group')
results = qbuild.all()

This one doesn't require any specific extras to be present, and can deal with the second use case described above. It's a bit less general though, since these filters do not make sense for the BaseSubmissionController. Hence, adding filters as an input argument to the submit_new_batch() method is not preferable (unless we override this method in the FromGroupSubmissionController class, but that does introduce some code duplication. Perhaps it would be best to simply add these (optional) filters as an input argument to the constructor (e.g. parent_group_filters). We can even add a method to adjust these filters if needed, but typically a new submission controller is instantiated anyways.

Both

The two approaches have their use cases, so maybe we can just implement both of them?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions