Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring template driver to dynamically create RAVEN workflows #391

Open
wants to merge 54 commits into
base: devel
Choose a base branch
from

Conversation

j-bryan
Copy link
Collaborator

@j-bryan j-bryan commented Nov 18, 2024


Pull Request Description

What issue does this change request address?

#390

What are the significant changes in functionality due to this change

This pull request restructures the templating system used to generate RAVEN workflows.

  • The RAVEN template and the template driver are separated into different classes.
  • A FeatureDriver class is introduced to define changes to a RAVEN template XML to add a single feature. For example, adding a Grid sampler to the RAVEN XML is done with one FeatureDriver instance, while adding the model used by the workflow is done with a different FeatureDriver.
  • FeatureDriver instances can be composed and grouped in FeatureCollection objects.
  • The template driver uses feature drivers to edit a template object.
  • Edits to the template XML are intended to be strictly additive. One aspect of the previous TemplateDriver implementation that I felt made the current state of the template XML unclear at times was not knowing which nodes were present at any given point in the code due to numerous additions, deletions, and edits made throughout the TemplateDriver. Moving to a strictly additive scheme while starting from a much smaller template XML reframes the template driver's function as adding the desired features to the XML, rather than deleting all other unneeded entities from the XML, as was common practice.
  • Addition of a "flat" RAVEN workflow template for specific cases which can be run as flat workflows.

For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large tes.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added, the the analytic documentation must be updated/added.
  • 9. If any test used as a basis for documentation examples have been changed, the associated documentation must be reviewed and assured the text matches the example.

@PaulTalbot-INL
Copy link
Collaborator

Wow, this is quite an effort! Do you have any diagramming or writeup for the logic flow? I think it's starting to make sense glancing over it, but some developer documentation would help get us up to speed on this new structure. Thanks for your work on this!

@j-bryan j-bryan marked this pull request as draft December 9, 2024 16:04
@j-bryan j-bryan marked this pull request as ready for review January 3, 2025 23:27
Copy link
Collaborator

@dylanjm dylanjm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes and discussion points. I read heavily my assigned sections and then did a quick look over the rest of the files. I will review once more when all changes are submitted.

I was thinking, it might be nice to add some linting checks to our automated testing. Nothing that stops the tests from running, but just checking to make sure we are consistent. Almost like a coverage summary but for linting. @joshua-cogliati-inl @caleb-sitton-inl what are your thoughts on this?

doc/developers/templates.md Outdated Show resolved Hide resolved
src/Cases.py Outdated Show resolved Hide resolved
src/DispatchManager.py Outdated Show resolved Hide resolved
src/Economics.py Show resolved Hide resolved
src/Placeholders.py Show resolved Hide resolved
src/Placeholders.py Show resolved Hide resolved
templates/debug_template.py Outdated Show resolved Hide resolved
templates/debug_template.py Outdated Show resolved Hide resolved
templates/debug_template.py Show resolved Hide resolved
templates/debug_template.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@joshua-cogliati-inl joshua-cogliati-inl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on templates/snippets

@classmethod
def from_xml(cls, node: ET.Element, **kwargs) -> "RavenSnippet":
"""
Alternate constructor which instantiates a new RavenSnippet objectfrom .n existing XML node
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "objectfrom .n existing" be "object from an existing"?


class RavenSnippet(ET.Element):
"""
RavenSnippet class objects describe one contiguous snippet of RAVEN XML, inheritingfrom .he xml.etree.ElementTree.Element
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "inheritingfrom .he" be "inheriting from the"?


def distribution_class_from_spec(spec) -> type[Distribution]:
"""
Make a new distribution classfrom .he RAVEN input spec for that class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "classfrom .he" to "class from the"?

@ In, value, str, the type value to set
@ Out, None
"""
self.set("type", str(value))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type of value is str, why is str(value) needed? (Ditto for "path")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary as long as the input type is respected. I'll remove in the interest of cleaner code.

"""
varname = name if not suffix else f"{name}_{suffix}"
if not loc:
loc = "Samplers|MonteCarlo@name:mc_arma_dispatch" # where this is pointing 9/10 times
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the doc string mention that this is the default for loc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea. I'll add that.

if tag not in self._allowed_subs:
raise ValueError(f"Step type {self.tag} does not accept subelements with tag {tag}. Allowed: {self._allowed_subs}.")

# Create an Assembler nodefrom .he entity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "nodefrom .he" to "node from the"?

@GabrielSoto-INL GabrielSoto-INL self-requested a review January 23, 2025 15:02
@joshua-cogliati-inl
Copy link
Collaborator

I was thinking, it might be nice to add some linting checks to our automated testing. Nothing that stops the tests from running, but just checking to make sure we are consistent. Almost like a coverage summary but for linting. @joshua-cogliati-inl @caleb-sitton-inl what are your thoughts on this?

We use pylint on RAVEN, so this seems reasonable. (pylint is available in conda and pypi)

@j-bryan
Copy link
Collaborator Author

j-bryan commented Jan 23, 2025

I was aware of pylint but hadn't made much use of it before. I was looking through some issues it brought up in my code for this PR and fixed some of the issues it brought up. I agree it would be good to add in a way similar to how it's used by RAVEN.

Copy link
Collaborator

@GabrielSoto-INL GabrielSoto-INL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of linting errors, some comments and questions. Fantastic work so far, big fan of the changes

has_static_history = any(s.is_type("CSV") for s in sources)
has_synthetic_history = any(s.is_type("ARMA") for s in sources)
if has_static_history and has_synthetic_history:
raise self.raiseAnError(ValueError, "Mixing ARMA and CSV sources is not yet supported! "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the raise here should be removed, the line should just be self.raiseAnError( ...

assert isinstance(dep, list)
new.append(xmlUtils.newNode('Index', attrib={'var':index}, text=', '.join(dep)))
dataobjects.append(new)
return not any([comp.get_capacity(None, raw=True).is_parametric() for comp in components])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my Pylinter (https://pylint.readthedocs.io/en/latest/user_guide/messages/refactor/use-a-generator.html), you can remove the square brackets here and just have any(comp.get_capacity(...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, using the generator expression directly instead of collecting that generator in a list first would work the same and is a little cleaner. I'll change that.

from .snippets.samplers import Sampler, SampledVariable, Grid, Stratified, CustomSampler, EnsembleForward
from .snippets.optimizers import BayesianOptimizer, GradientDescent
from .snippets.models import GaussianProcessRegressor, PickledROM, EnsembleModel
from .snippets.distributions import Distribution, Uniform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should Uniform here be NewDistribution? I dont see a Uniform class in the snippets

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see Uniform is used in _create_new_sampled_capacity below, but it does exist in snippets.distributions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I'm doing something sneaky to create the distribution classes. Since there are so many that RAVEN can accept, I'm actually parsing the RAVEN input specs for the distribution classes and dynamically creating RavenSnippet classes for them. You can look at templates/snippets/distributions.py to see how I'm doing that. The downside is I don't think any linter will be able to find those classes I create.

# PUBLIC API FUNCTIONS #
########################

def loadTemplate(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these three "public API functions" the input parameter list is different from the parent Template class that RavenTemplate inherits, should look at that one and consolidate/update

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. I'm not a huge fan of the RAVEN class's function signatures for these methods but it is what it is, I guess

@ Out, None
"""
this_file_dir = Path(__file__).parent
template_path = this_file_dir / self.template_path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is self.template_path being set? should it be in the init?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.template_path is set by the subclasses, but I agree that this could've been done better. I'll see if I can revise that while changing these public API methods to better match the RAVEN super class definitions.

class MockListPropertyUser:
""" A minimal class implementing an attribute as a listproperty """
def __init__(self):
self._list = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstrings for the methods

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns ordered differently on purpose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column reordering isn't intentional, just how things happened to shake out with how the refactored code orders things. The diff GitHub calculates here is a little misleading though. The real difference is in the way the prefix column is expressed. Since there is no inner/outer workflow for this case (prefixes values expressed as {i_outer}_{i_inner} or something like that), the prefix values are just ints for the flat workflow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's a static history so no need for statistics on the results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea why these files have changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a typo in template_driver.py which failed to add some values to the dispatch.nc database. The line in question is here:

db.append(xmlUtils.newNode('variables', text='GRO_outer_debug_dispath,GRO_outer_debug_synthetics'))

The variable group GRO_outer_debug_dispath should have been GRO_outer_debug_dispatch. The dispatch.nc file here has changed to reflect the addition of these values.

Copy link
Collaborator

@joshua-cogliati-inl joshua-cogliati-inl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed HERON/tests/unit_tests/snippets and HERON/templates/xml.

import sys
import os
import unittest

Copy link
Collaborator

@joshua-cogliati-inl joshua-cogliati-inl Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a policy (or precedent) that allows skipping doc strings in tests? Otherwise, they probably need to be added.

@j-bryan j-bryan changed the title [WIP] Refactoring template driver to dynamically create RAVEN workflows Refactoring template driver to dynamically create RAVEN workflows Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants