Skip to content

Conversation

@mrhardman
Copy link
Collaborator

@mrhardman mrhardman commented Apr 16, 2025

The changes included here aim to significantly expand the tests available in tests/mms. The following features are added:

  • symbolic operators in the tests/mms/perpendicular_laplacian.py module covering all of the differential operators that are present in neutral_mixed.
  • extension of nonorthogonal_test.py and orthogonal_test.py to test the newly added differential operators. This requires some changes to main.cxx.
  • an automatic test of the right hand side of the equation solved in neutral_mixed. We check that the definition of the right hand side matches the expected convergence order. Note that the expected convergence order is second order (2) when evolve_momentum = false, but the observed convergence order is first order or worse when evolve_momentum = true.
  • ability to add an external source of momentum to neutrals in neutral_mixed.
  • ability to read normalised sources in neutral_mixed.

The major addition is the test script neutral_mixed_ddt_test.py, which runs the following four tests with prescribed functions Nd, Pd, NVd that are constructed to be functions of (x,y) and satisfy the boundary conditions imposed in neutral_mixed.

  • neutral_mixed/evolve_momentum_False_conservation_test_False. Check the definition of the RHS when evolve_momentum = false, by calculating symbolically ddt(Nd) and ddt(Pd), and using these symbolic expressions as sources to cancel the numerical ddt(Nd) and ddt(Pd), such that the output ddt(Nd) and ddt(Pd) are of order the discretisation error. The spatial resolution is scanned to check convergence, with the following results.
    fig_0
    fig_1

  • neutral_mixed/evolve_momentum_False_conservation_test_True. Check that the RHS when evolve_momentum = false conserves Nd and Pd. The spatial resolution is scanned to check convergence, with the following results. Note that density is exactly conserved but pressure is only conserved within numerical error. A volume integration is done over ddt(Nd) and ddt(Pd) (without a symbolic source) to obtain these errors.
    fig_0
    fig_1

  • neutral_mixed/evolve_momentum_True_conservation_test_False. Check the definition of the RHS when evolve_momentum =true, by calculating symbolically ddt(Nd), ddt(NVd) and ddt(Pd), and using these symbolic expressions as sources to cancel the numerical ddt(Nd), ddt(NVd)and ddt(Pd), such that the output ddt(Nd), ddt(NVd)and ddt(Pd) are of order the discretisation error. The spatial resolution is scanned to check convergence, with the following results.
    fig_1
    fig_2
    fig_0

  • neutral_mixed/evolve_momentum_True_conservation_test_True. Check that the RHS when evolve_momentum = true conserves Nd and energy. The spatial resolution is scanned to check convergence, with the following results. Note that conservation is first order accurate for density and less than first order for energy.
    fig_0
    fig_1

These tests can be readily extended to test the ddt() of other physics modules, or to 3D (x,y,z), allowing for rapid testing of the definition of the equations solved.

Potential issues requiring further work:

  • Flag documentation issues, with respect to dimensionally incorrect documentation for momentum equation for neutrals, factors of mass in viscous coefficients. (See neutral_mixed docs and spurious (?) mass factor #328 for notes).
  • Determine why energy conservation was not observed.
  • Extend nonorthogonal operators to charged species.
  • Demonstrate neutral_mixed working smoothly with nonorthogonal mesh in realistic physics case.
  • Improve readme and documentation for tests -- reviewers please make suggestions.
  • Anything else raised by reviewers.

…or encountered: Field3D: Operation on empty data.
…e accumulation at isolated points on core boundary persists.
…. Strange accumulation at isolated points on core boundary persists."

This reverts commit d7de1d3.
…r -- Error encountered: Field3D: Operation on empty data."

This reverts commit 07a7b12.
… expected convergence order, permitting first-order accurate operators.
…t current test structure seems to require a different list and function call block in main.cxx due to differential operator functions have unique inputs, with a mix of const and non-const Field3D variables.
…g ddt for the model. This commit provides an initial job script and templates for the runs.
…llision frequency is probably not correctly captured.
…g easier. Put sensible default options into the test script.
… is small, increase length of timestep to give chance for larger error to develop.
…t, make separate test directories for each test.
…tral_mixed. We test that the `ddt(Nd)`, `ddt(NVd)` and `ddt(Pd)` match the expected definitions from documentation, and we test that the RHS conserves `Nd` and `Pd` (or energy if `NVd` is evolved) by integrating `ddt(*)` over the volume. This test can be used to help give confidence when refactoring `neutral_mixed.cxx`. Note that the convergence order appears to be first-order accurate when `NVd` is evolved (or worse for energy conservation). This may be explained by the use of a slope limiter for terms involving `NVd`. Tests have been set to pass in the current state, but further investigations into the best numerical methods for preserving energy conservation should be carried out.
…at a momentum source can be specified in SI units for `neutral_mixed`.
@cmacmackin
Copy link
Collaborator

The reason the CIs keep failing on this is that you're using the CVODE solver but SUNDIALS isn't available in your CI image. I've switched it over to PVODE, which seems to work. I've also updated the MMS testing script to print the output when there is an error, which should make such issues easier to diagnose in future.

@cmacmackin
Copy link
Collaborator

I'm struggling to work out why the CI is continuing to fail for neutral_mixed_ddt_test.py. It always seems to occur in the post-processing stages, but not consistently in the same place. When error messages are produced they indicate that it is related to the xarray objects reading netCDF data. I can not reproduce any of these locally (even when running with act).

Run 1947

Error seems to occur after reading in and printing the second BOUT++ dataset for the final test case. Presumably it actually happens while reading the third (and final) dataset. Reported as a segfault. No backtrace is provided.

...

[Pd]
function = (0.2*x^3 - 0.3*x^2 + 0.1)*sin(2*y) + 1.0
bndry_core = neumann
bndry_all = neumann

[NVd]
function = (x^4 - 2*x^3 + x^2)*sin(y)
bndry_core = neumann
bndry_all = neumann



93% tests passed, 1 tests failed out of 14

Total Test time (real) = 220.70 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (SEGFAULT)

Run 1948

Error seemed to occurred when converting l2norm to a NumPy array for the final test case. This is after all of the datasets have been read in.

Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/common.py", line 158, in __float__
    return float(self.values)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/dataarray.py", line 797, in values
    return self.variable.values
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/variable.py", line 530, in values
    return _as_array_or_item(self._data)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/variable.py", line 315, in _as_array_or_item
    data = np.asarray(data)
  File "/home/runner/.local/lib/python3.10/site-packages/dask/array/core.py", line 1729, in __array__
    x = self.compute()
  File "/home/runner/.local/lib/python3.10/site-packages/dask/base.py", line 373, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/runner/.local/lib/python3.10/site-packages/dask/base.py", line 681, in compute
    results = schedule(expr, keys, **kwargs)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 577, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 580, in get_duck_array
    return self.array.get_duck_array()
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 799, in get_duck_array
    return self.array.get_duck_array()
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 654, in get_duck_array
    array = self.array[self.key]
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 103, in __getitem__
    return indexing.explicit_indexing_adapter(
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/core/indexing.py", line 1023, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 116, in _getitem
    array = getitem(original_array, key)
  File "src/netCDF4/_netCDF4.pyx", line 5055, in netCDF4._netCDF4.Variable.__getitem__
  File "src/netCDF4/_netCDF4.pyx", line 4631, in netCDF4._netCDF4.Variable.shape.__get__
  File "src/netCDF4/_netCDF4.pyx", line 3786, in netCDF4._netCDF4.Dimension.__len__
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/hermes-3/hermes-3/build/tests/mms/neutral_mixed_ddt_test.py", line 176, in <module>
    success, output_message_3 = run_neutral_mixed_manufactured_solutions_test(test_input)
  File "/home/runner/work/hermes-3/hermes-3/build/tests/mms/job_functions.py", line 442, in run_neutral_mixed_manufactured_solutions_test
    l2norm = np.array(l2norm)
ValueError: setting an array element with a sequence.


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 220.37 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (Failed)

Run 1949

Same as Run 1947.

Run 1950

Error in final test case. I think this happens when turning the l2error into a NumPy array, again.

Exception ignored in: <function CachingFileManager.__del__ at 0x7fcd20379ab0>
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 250, in __del__
    self.close(needs_lock=False)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 234, in close
    file.close()
  File "src/netCDF4/_netCDF4.pyx", line 2669, in netCDF4._netCDF4.Dataset.close
  File "src/netCDF4/_netCDF4.pyx", line 2636, in netCDF4._netCDF4.Dataset._close
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 220.34 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (SEGFAULT)

Run 1955

Same as 1947 and 1950, except this time there is a backtrace:

Exception ignored in: <function CachingFileManager.__del__ at 0x7facddacdab0>
Traceback (most recent call last):
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 250, in __del__
    self.close(needs_lock=False)
  File "/home/runner/.local/lib/python3.10/site-packages/xarray/backends/file_manager.py", line 234, in close
    file.close()
  File "src/netCDF4/_netCDF4.pyx", line 2669, in netCDF4._netCDF4.Dataset.close
  File "src/netCDF4/_netCDF4.pyx", line 2636, in netCDF4._netCDF4.Dataset._close
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 222.76 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (SEGFAULT)

Run 1956

Again, seems to happen when converting the l2norm to a NumPy array for the final test case. This time there is no backtrace, just a low-level memory error.

malloc_consolidate(): unaligned fastbin chunk detected


93% tests passed, 1 tests failed out of 14

Total Test time (real) = 219.69 sec

The following tests FAILED:
	 14 - rhs_mms_neutral_mixed (Subprocess aborted)
Error: Process completed with exit code 8.

Some reflections

The evenly numbered runs are those labelled as being for the "pull request", while the odd number of runs are labelled as for the "push". In reality, both run exactly the same workflow. However, the "pull request" jobs fail when converting the l2norm to a NumPy array while the "push" jobs fail slightly earlier, while reading in the third dataset. The error messages that get printed vary from run to run, but not are particularly helpful in figuring out what is going wrong.

@mikekryjak
Copy link
Collaborator

Thanks @cmacmackin for your investigation so far. Maybe @ZedThree could chime in once he's back from leave - this seems particularly subtle..

Dnn = (Tn / AA) / Rnn;
if (localstate.isSet("collision_frequency")) {
// Dnn = Vth^2 / sigma
Dnn = (Tn / AA) / (get<Field3D>(localstate["collision_frequency"]) + Rnn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is partly a question about the previous implementation, but when collision_frequency is set, we have collision_frequency + Rnn on the denominator: a) is that correct? b) should that be reflected in these changes?

We could do something like:

  const Field3D Rnn = (rnn_override > 0.0) ? rnn_override : sqrt(floor(Tn, 1e-5) / AA) / neutral_lmax;

  if (localstate.isSet("collision_frequency")) {
    // Dnn = Vth^2 / sigma
    Dnn = (Tn / AA) / (get<Field3D>(localstate["collision_frequency"]) + Rnn);
  } else {
    Dnn = (Tn / AA) / Rnn;
  }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is correct. Rnn is a pseudo-collisionality which goes into the neutral diffusion model to limit its mean free path to a user set length, e.g. the machine size. It's a concept specific to this component and shouldn't live elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the new user-override also add on to collision_frequency?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user input is actually neutral_lmax, the max mean free path length scale. Rnn does not need to be a user input.
Both variables are poorly named, by the way. I have reworked this on my fork which improves the flux limitation for the neutral model which is due testing and merging once I get the time.

Comment on lines 98 to +103
print("../.././hermes_mms_tests -d "+workdir+" > "+workdir+"/output.txt")
os.system("../.././hermes_mms_tests -d "+workdir+" > "+workdir+"/output.txt")
s = os.system("../.././hermes_mms_tests -d "+workdir+" > "+workdir+"/output.txt")
if s != 0:
print(f"Command exited with status {s}. STDOUT printed below:")
with open(workdir + "/output.txt") as f:
print(f.read())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a slightly neater pattern:

Suggested change
print("../.././hermes_mms_tests -d "+workdir+" > "+workdir+"/output.txt")
os.system("../.././hermes_mms_tests -d "+workdir+" > "+workdir+"/output.txt")
s = os.system("../.././hermes_mms_tests -d "+workdir+" > "+workdir+"/output.txt")
if s != 0:
print(f"Command exited with status {s}. STDOUT printed below:")
with open(workdir + "/output.txt") as f:
print(f.read())
workdir = pathlib.Path(workdir)
command = f"../.././hermes_mms_tests -d {workdir}"
print(command)
s, out = launch(command, pipe=True)
(workdir / "output.txt").write_text(out)
if s != 0:
print(f"Command exited with status {s}. STDOUT printed below:\n{out}")

You can also use launch_safe to raise an exception if the command fails

(Also note to self: maybe launch/launch_safe should be able to write to a log file automatically?)

Copy link
Collaborator Author

@mrhardman mrhardman Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion created error messages when I tried it locally. If you want to avoid using the os module, could you please make a PR with the appropriate changes?

P.S. errors were not due to failing to include pathlib, which I could manage, but due to something more mysterious

terminate called after throwing an instance of 'BoutExceptionterminate called after throwing an instance of 'BoutException'
  what():       ERROR: Cannot split 60 Y points equally between 32 processors

Comment on lines +266 to +272
if not os.path.isdir(base_test_dir):
os.system("mkdir "+base_test_dir)
# create sub-directory, if required
if not sub_test_dir is None:
base_test_dir = base_test_dir + "/" + sub_test_dir
if not os.path.isdir(base_test_dir):
os.system("mkdir "+base_test_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is based on the other implementation, but I very much recommend using pathlib.Path for filesystem stuff in Python. This can a couple of lines:

Suggested change
if not os.path.isdir(base_test_dir):
os.system("mkdir "+base_test_dir)
# create sub-directory, if required
if not sub_test_dir is None:
base_test_dir = base_test_dir + "/" + sub_test_dir
if not os.path.isdir(base_test_dir):
os.system("mkdir "+base_test_dir)
base_test_dir = pathlib.Path(base_test_dir)
base_test_dir.mkdir(parents=True, exist_ok=True)
if sub_test_dir is not None:
(base_test_dir / sub_test_dir).mkdir(parents=True, exist_ok=True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this suggestion created error messages when I tried it locally. If you want to avoid using the os module, could you please make a PR with the appropriate changes?


return success, output_message

def run_neutral_mixed_manufactured_solutions_test(test_input):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's actually different between this and run_manufactured_solutions_test? Is it just the base template input file? Maybe we could squash them together somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run_manufactured_solutions_test() test calls a standalone executable to test the differential operators, run_neutral_mixed_manufactured_solutions_test() calls hermes-3 to test the neutral_mixed equations. I would not squash these functions, unless you can see a nice way to do it.

Comment on lines +56 to +65
const auto differential_operators_1_arg = {
nameandfunction1{"Div_par(f)",
[](const Field3D &f) {
return Div_par(f);
}},
nameandfunction1{"Grad_par(f)",
[](const Field3D &f) {
return Grad_par(f);
}},
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to do:

Suggested change
const auto differential_operators_1_arg = {
nameandfunction1{"Div_par(f)",
[](const Field3D &f) {
return Div_par(f);
}},
nameandfunction1{"Grad_par(f)",
[](const Field3D &f) {
return Grad_par(f);
}},
};
const auto differential_operators_1_arg = {
nameandfunction1{"Div_par(f)", Div_par},
nameandfunction1{"Grad_par(f)", Grad_par},
};

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting to do this leads to error messages like

error: cannot resolve overloaded function 'Div_par' based on conversion to type 'std::function<Field3D(const Field3D&)>'
   57 |     nameandfunction1{"Div_par(f)", Div_par},

In principle, I happy for this code to be simplified if possible, if you wanted to make a PR with working changes.

Comment on lines +161 to +162
for (const auto& difop: differential_operators_1_arg) {
if (difop.name.compare(differential_operator_name) == 0){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what's going on here. I think the intention is that we have a whole bunch of operators we want to iterate over but they take different arguments, which means they can't be stored in the same collection. So what we end up doing is having a bunch of different collections, and a loop for each one to both look for and call the particular operator.

If that's the case, then we can probably do something like use a base class with the common information, then we can have a single collection that we iterate over. We could also move the duplicated dump bits into a helper function. Something like:

class NamedFunctionBase {
  std::string name;
  std::string outname;
  std::string expected_name;

public:
  NamedFunctionBase(std::string name_, int i) : 
    name(std::move(name_)),
    outname(fmt::format("result_{}", i),
    expected_name(fmt::format("expected_result_{}", i)
    {}

  template<class... Args>
  void operator()(Options& dump, Mesh& mesh, Args... args) {
    dump[outname] = func(args...);
    dump[outname].setAttributes({{"operator", name}});
    Field3D expected_result{&mesh};
    mesh.get(expected_result, expected_name, 0.0, false);
    dump[expected_name] = expected_result;
  }
};

I think this is probably a little bit tricky, so I'm happy to have a chat about this, or have a stab at it myself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have understood correctly -- initially there was a single collection of operators, but then different operator types were added, and this was my working solution. Happy for this to be simplified if it can be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants