Skip to content

Conversation

@W0lfShAd0w
Copy link
Collaborator


Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

This fixes the data desyncing issues that were observed in single- and multi-objective GA optimization. In addition, this corrects issues in the NSGA-ii survivor selection that were causing results to be incorrectly overwritten, leading to erroneous data being reported. This PR has some overlap/may not be entirely separable from PR #2532.

What are the significant changes in functionality due to this change request?

For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

khnguy22 and others added 29 commits June 11, 2025 20:35
…ange in GA with new mutation and crossover type
… identified and fixed by Khang where non-objective optimization values were not being returned by the GA.
…s removes the arbitrary restriction requiring DataObjects to include both 'Inputs' and 'Outputs' nodes, despite the situations where that doesn't makes sense.
…e in the _SolutionExport of NSGA-ii that resulted in model inputs and outputs being coupledi incorrectly. This hotfix does NOT correct the same issue with RAVEN's estimation for the 'final' best values.
…ing of optimizer results and the crowding distance calculation for NSGA-ii. Tested with the GA test suite and stress tested with the ZDT1 test in particular.
… hardcoded seed value (e.g. 5489) has been replaced with the default seed value of 'None', which prompts numpy.random to take a high-entropy seed value from the OS (e.g. the system clock). A new subnode was added to <RunInfo> to allow for a globalSeed to be set in RAVEN prior to any code execution, which ensures a user-supplied RNG seed is applied before any RNG calls are made, if desired. Setting this globalSeed value to 5489 was necessary in all test files to ensure backwards compatibility with old gold results.
…o be calculated incorrectly. (#2)

Co-authored-by: Rollins <[email protected]>
…on on the output values prior to calculating the fitness. This had to be implemented separately from the standard RAVEN noramlizeData methodology, as we didn't want to normalize the inputs or return the output values to the user in a normalized format; the normalized values are ONLY needed to estimate the fitness when requested.
…en the solution inputs, constraints, and objectives. The culprit was a dict.update() line that was overwriting the correct values with desynced values. This line was necessary because the self._solutionExport() is not being defined correctly. This will be fixed in a subsequent commit.
… the solutions. The desyncing occurs when the 'populationFitness' local variable (used by single-objective optimization) is stored as the 'self.fitness' attribute of the Algorithm by the survivor selection method. NSGA-II can use the populationFitness local variable just fine, so the 'self.fitness' attribute is superfluous anyway.
… support a reduced input format. Penalty scaling factors are now interpreted as a 2d-array of shape (len(objVar),constraintNum). Function docstrings have been updated accordingly.
…e the way the kwargs dict was being provided to the function.
…solutions to make sure each part of the reproduction process was using the correct values and data for grandparents, parents, and children and that these data were being stored appropriately without overwriting. This in turn fixed the fitness value desyncing issue in NSGA-II as well.
…ividuals from GA are correctly added to and printed with the list of final solutions in the _solutionExport.
…ce redundancy and prevent data from being deleted unnecessarily.
… identified and fixed by Khang where non-objective optimization values were not being returned by the GA.
…e in the _SolutionExport of NSGA-ii that resulted in model inputs and outputs being coupledi incorrectly. This hotfix does NOT correct the same issue with RAVEN's estimation for the 'final' best values.
…ing of optimizer results and the crowding distance calculation for NSGA-ii. Tested with the GA test suite and stress tested with the ZDT1 test in particular.
…en the solution inputs, constraints, and objectives. The culprit was a dict.update() line that was overwriting the correct values with desynced values. This line was necessary because the self._solutionExport() is not being defined correctly. This will be fixed in a subsequent commit.
… the solutions. The desyncing occurs when the 'populationFitness' local variable (used by single-objective optimization) is stored as the 'self.fitness' attribute of the Algorithm by the survivor selection method. NSGA-II can use the populationFitness local variable just fine, so the 'self.fitness' attribute is superfluous anyway.
…solutions to make sure each part of the reproduction process was using the correct values and data for grandparents, parents, and children and that these data were being stored appropriately without overwriting. This in turn fixed the fitness value desyncing issue in NSGA-II as well.
…ifferent branch. This needs to be re-added in a future merge.
@Jimmy-INL Jimmy-INL self-requested a review October 15, 2025 16:31
@Jimmy-INL
Copy link
Collaborator

Jimmy-INL commented Oct 15, 2025

@W0lfShAd0w
Almost all the GA tests failed.
Screenshot 2025-10-15 at 10 37 09 AM

@Jimmy-INL
Copy link
Collaborator

@W0lfShAd0w, I have found the issue, but I want you to find it too. This will help you navigate our regression testing system.


objectiveVal = []
currentPop_objvals = []
for i in range(len(self._objectiveVar)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are mixing Camel Case with with Snake Case (currentPopInputs vs current_pop_inputs) In raven we do adopt Camel case we never use '_'. Please modify all variable names to match this notion.

currentPop_fitsbysoln = datasetToDataArray(currentPop_fitness, self._objectiveVar).data.tolist()
## 5. Compute the rank of current population
currentPop_ranks = frontUtils.rankNonDominatedFrontiers(np.array(currentPop_fitsbysoln), isFitness=True)
currentPop_ranks = xr.DataArray(currentPop_ranks,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only defined if self._isMultiObjective is true. But it is used even in multiobjective and hence it will error out because it is used before assigned to a value. How did you test this? it will never run.

rollnk and others added 3 commits October 22, 2025 11:55
…EN. Also, fixed a bug where unneeded multiobjective variables were expected but not initialized in single objective GA.
* The outdated behavior of having randomUtils initialize the RNG with a hardcoded seed value (e.g. 5489) has been replaced with the default seed value of 'None', which prompts numpy.random to take a high-entropy seed value from the OS (e.g. the system clock). A new subnode was added to <RunInfo> to allow for a globalSeed to be set in RAVEN prior to any code execution, which ensures a user-supplied RNG seed is applied before any RNG calls are made, if desired. Setting this globalSeed value to 5489 was necessary in all test files to ensure backwards compatibility with old gold results.

* globalSeed parameter of RunInfo now supports 'None' as a valid input.

* Added a print statement for when no GlobalSeed is provided.

* Minor changes made to address comments in PR idaholab#2534. Several tests were updated to have the proper expliciting seeding of the RNG. The unseeded test in testRandomUtils.py was modified to check 5 random floats for any repeats, which could indicate the RNG is failing.

* global seed added to more tests to ensure consistency with golds.

* Minor change to clarify output messages from globalSeed check.

* Deprecate Rattlesnake and Mammoth (idaholab#2519)

* remove Rattlesnake, Mammoth, and Instant tests

* remove Rattlesnake and Mammoth codeinterfaces

* removing from CodeInterface factory

* removing Mammoth and Rattlesnake references in the docs

* Added globalSeed parameter to test input file for backwards compatability.

* Modified the tolerance on the multiYearDWT test to account for uncertainties in the fitted coefficients due to the small amount of training data.

* Increased tolerances further for multiYearDWT test to get around fitting inconsistencies on the Linux OS test machines. This will be raised and corrected in an issue.

* Attempt at addressing the possible intermittent test error on Fedora machine.

---------

Co-authored-by: Rollins <[email protected]>
Co-authored-by: Rollins <[email protected]>
Co-authored-by: Gabriel J. Soto Gonzalez <[email protected]>
Co-authored-by: Rollins <[email protected]>
Co-authored-by: rollnk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants