-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Use cases
Currently, there is no way to indicate that you want to skip certain extras/filter nodes from the parent_group when submitting work chains. Here's two examples of use cases for this feature:
- For the 3DCD runs, we typically only run structures up to a certain system size (i.e. number of sites in the unit cell).
- Imagine that the work chains you want to submit depend on the outputs of a previous work chain. In this case you most likely only want to run work chains that have finished with exit status 0.
Possible approaches
Using skip_extras
Initially, the solution I had in mind was to add a skip_extras input argument, representing a function that takes the extras and returns True or False depending on whether a certain set of extras should be run. This would be added first as an input argument to the .submit_new_batch() method, and passed to the get_all_extras_to_submit() method:
def submit_new_batch(self, dry_run=False, sort=True, sleep=1, skip_extras=None):
"""Submit a new batch of calculations, ensuring less than self.max_concurrent active at the same time.
:param dry_run: simply return the extras that would be submitted.
:param sort: sort the work chains by the extras before submissions.
:param skip_extras: function that returns True in case a set of extras should be skipped, False otherwise.
"""
to_submit = []
extras_to_run = set(self.get_all_extras_to_submit(skip_extras)).difference(self._check_submitted_extras())
[...]In the FromGroupSubmissionController.get_all_extras_to_submit(), for example, the function would be used to filter out the extras that didn't pass the test:
if skip_extras is not None:
results = [tuple(_) for _ in qbuild.all() if not skip_extras(_)]
else:
results = [tuple(_) for _ in qbuild.all()]This means we have to add the extras that are required for this filtering, of course. Typically you can use the ones that uniquely define the work chain though. The above implementation is flawed in the sense that you have to rely on the index of the extra you are interested in when implementing the skip_extras method. But this can probably be fixed.
Using filters
Another straightforward approach in the case of the FromGroupSubmissionController (where both use cases stem from) is to have a filters inputs that is applied to the query to obtain the extras to submit:
aiida-submission-controller/aiida_submission_controller/from_group.py
Lines 75 to 83 in 5a0adce
| qbuild = orm.QueryBuilder() | |
| qbuild.append(orm.Group, | |
| filters={'id': self.parent_group.pk}, | |
| tag='group') | |
| qbuild.append(orm.Node, | |
| project=extras_projections, | |
| tag='process', | |
| with_group='group') | |
| results = qbuild.all() |
This one doesn't require any specific extras to be present, and can deal with the second use case described above. It's a bit less general though, since these filters do not make sense for the BaseSubmissionController. Hence, adding filters as an input argument to the submit_new_batch() method is not preferable (unless we override this method in the FromGroupSubmissionController class, but that does introduce some code duplication. Perhaps it would be best to simply add these (optional) filters as an input argument to the constructor (e.g. parent_group_filters). We can even add a method to adjust these filters if needed, but typically a new submission controller is instantiated anyways.
Both
The two approaches have their use cases, so maybe we can just implement both of them?