Re-enabling robust integration test. #1729

clessig · 2026-01-28T07:12:34Z

Description

Re-enabling robust integration test

Issue Number

Closes #1712

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

clessig · 2026-01-28T07:13:13Z

Evaluation currently breaks with (CC @SavvasMel @iluise):

Opening zipstore, read-only: True
FAILEDend fixture


====================================================== FAILURES =======================================================
__________________________________ test_train_multi_stream[test_multi_stream_77fb7] ___________________________________

setup = None, test_run_id = 'test_multi_stream_77fb7'

    @pytest.mark.parametrize("test_run_id", ["test_multi_stream_" + commit_hash])
    def test_train_multi_stream(setup, test_run_id):
        """Test training with multiple streams including gridded and observation data."""
        logger.info(f"test_train_multi_stream with run_id {test_run_id} {WEATHERGEN_HOME}")
    
        train_with_args(
            f"--base-config={WEATHERGEN_HOME}/integration_tests/small_multi_stream.yaml".split()
            + [
                "--run-id",
                test_run_id,
            ],
            f"{WEATHERGEN_HOME}/integration_tests/streams_multi/",
        )
    
        infer_multi_stream(test_run_id)
>       evaluate_multi_stream_results(test_run_id)

integration_tests/small_multi_stream_test.py:71: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
integration_tests/small_multi_stream_test.py:159: in evaluate_multi_stream_results
    evaluate_from_config(cfg, None, None)
packages/evaluate/src/weathergen/evaluate/run_evaluation.py:334: in evaluate_from_config
    results = [_process_stream(**task) for task in tasks]
packages/evaluate/src/weathergen/evaluate/run_evaluation.py:230: in _process_stream
    plot_data(reader, stream, global_plotting_opts)
packages/evaluate/src/weathergen/evaluate/utils/utils.py:399: in plot_data
    maps_config = common_ranges(
packages/evaluate/src/weathergen/evaluate/utils/utils.py:592: in common_ranges
    list_max = calc_bounds(data_tars, data_preds, var, "max")
packages/evaluate/src/weathergen/evaluate/utils/utils.py:648: in calc_bounds
    calc_val(da_tars.where(da_tars.channel == var, drop=True), bound),
packages/evaluate/src/weathergen/evaluate/utils/utils.py:616: in calc_val
    return x.max(dim=("ipoint")).values
.venv/lib/python3.12/site-packages/xarray/core/_aggregations.py:2820: in max
    return self.reduce(
.venv/lib/python3.12/site-packages/xarray/core/dataarray.py:3857: in reduce
    var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs)
.venv/lib/python3.12/site-packages/xarray/core/variable.py:1681: in reduce
    result = super().reduce(
.venv/lib/python3.12/site-packages/xarray/namedarray/core.py:920: in reduce
    data = func(self.data, axis=axis, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

values = dask.array<where, shape=(2, 40320, 1), dtype=float32, chunksize=(1, 672, 1), chunktype=numpy.ndarray>
axis = 1, skipna = None, kwargs = {}
xp = <module 'numpy' from '/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/numpy/__init__.py'>
func = None

    def f(values, axis=None, skipna=None, **kwargs):
        if kwargs.pop("out", None) is not None:
            raise TypeError(f"`out` is not valid for {name}")
    
        # The data is invariant in the case of 0d data, so do not
        # change the data (and dtype)
        # See https://github.com/pydata/xarray/issues/4885
        if invariant_0d and axis == ():
            return values
    
        xp = get_array_namespace(values)
        values = asarray(values, xp=xp)
    
        if coerce_strings and dtypes.is_string(values.dtype):
            values = astype(values, object)
    
        func = None
        if skipna or (
            skipna is None
            and (
                dtypes.isdtype(
                    values.dtype, ("complex floating", "real floating"), xp=xp
                )
                or dtypes.is_object(values.dtype)
            )
        ):
>           from xarray.computation import nanops
E           ImportError: cannot import name 'nanops' from 'xarray.computation' (/users/lessig/santis/WeatherGenerator/.venv/lib/python3.12/site-packages/xarray/computation/__init__.py)

.venv/lib/python3.12/site-packages/xarray/core/duck_array_ops.py:519: ImportError
------------------------------------------------- Captured log setup --------------------------------------------------
INFO     small_multi_stream_test:small_multi_stream_test.py:49 setup fixture with test_multi_stream_77fb7
-------------------------------------------------- Captured log call --------------------------------------------------
INFO     small_multi_stream_test:small_multi_stream_test.py:59 test_train_multi_stream with run_id test_multi_stream_77fb7 /users/lessig/santis/WeatherGenerator
INFO     weathergen.common.config:config.py:505 Loading private config from platform-env.py: /users/lessig/santis/WeatherGenerator-private/hpc/platform-env.py.
INFO     weathergen.common.config:config.py:524 Detected HPC: santis.
INFO     weathergen.common.config:config.py:530 Loading private config from platform-env.py output: /users/lessig/santis/WeatherGenerator-private/hpc/santis/config/paths.yml.
INFO     weathergen.common.config:config.py:481 Using existing config as overwrite: {}.
INFO     weathergen.common.config:config.py:550 Loading specified base config from file: /users/lessig/santis/WeatherGenerator/integration_tests/small_multi_stream.yaml.
INFO     weathergen.common.config:config.py:446 Using assigned run_id: test_multi_stream_77fb7. If you manually selected this run_id, this is an error.
INFO     weathergen.common.config:config.py:505 Loading private config from platform-env.py: /users/lessig/santis/WeatherGenerator-private/hpc/platform-env.py.
INFO     weathergen.common.config:config.py:524 Detected HPC: santis.
INFO     weathergen.common.config:config.py:530 Loading private config from platform-env.py output: /users/lessig/santis/WeatherGenerator-private/hpc/santis/config/paths.yml.
INFO     weathergen.common.config:config.py:505 Loading private config from platform-env.py: /users/lessig/santis/WeatherGenerator-private/hpc/platform-env.py.
INFO     weathergen.common.config:config.py:524 Detected HPC: santis.
INFO     weathergen.common.config:config.py:530 Loading private config from platform-env.py output: /users/lessig/santis/WeatherGenerator-private/hpc/santis/config/paths.yml.
------------------------------------------------ Captured log teardown ------------------------------------------------
INFO     small_multi_stream_test:small_multi_stream_test.py:53 end fixture
================================================== warnings summary ===================================================
integration_tests/small_multi_stream_test.py: 34 warnings
  /users/lessig/.local/share/uv/python/cpython-3.12.11-linux-aarch64-gnu/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=249480) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================== short test summary info ===============================================
FAILED integration_tests/small_multi_stream_test.py::test_train_multi_stream[test_multi_stream_77fb7] - ImportError: cannot import name 'nanops' from 'xarray.computation' (/users/lessig/santis/WeatherGenerator/.venv/li...
===================================== 1 failed, 34 warnings in 582.15s (0:09:42) ======================================

clessig · 2026-01-28T07:13:52Z

The parameters also need to be re-tuned. ERA5 converged too slowly, but using a very limited number of channels as before also let to unreliable convergence for NPP-ATMS.

grassesi · 2026-01-28T07:32:56Z

The parameters also need to be re-tuned. ERA5 converged too slowly, but using a very limited number of channels as before also let to unreliable convergence for NPP-ATMS.

I want to have a more general discussion about the goals of our testing including also Tim if possible. I think we should separate testing the control and data flow from testing for convergence:

If I am doing development I want to have a quick and convenient way to check nothing breaks (control/data flow wise). For this it is useful to have a very broad/big model configuration (To cover more code), that runs for max. 5 min. on all platforms.
This broad model configuration then requires longer training, if testing for convergence. This defeats the purpose of having something that can be run often, quickly and conveniently.

clessig · 2026-01-28T07:37:50Z

The parameters also need to be re-tuned. ERA5 converged too slowly, but using a very limited number of channels as before also let to unreliable convergence for NPP-ATMS.

I want to have a more general discussion about the goals of our testing including also Tim if possible. I think we should separate testing the control and data flow from testing for convergence:

If I am doing development I want to have a quick and convenient way to check nothing breaks (control/data flow wise). For this it is useful to have a very broad/big model configuration (To cover more code), that runs for max. 5 min. on all platforms.

This broad model configuration then requires longer training, if testing for convergence. This defeats the purpose of having something that can be run often, quickly and conveniently.

For ERA5 we had a config that tests/tested convergence (that it starts to converge, but that's enough) in less than 5 min on all platforms. So I still don't fully see why we should have multiple tests.

Adjsuted parameters; re-enabled evaluation which currently breaks

f9c5b82

github-project-automation bot added this to WeatherGen-dev Jan 28, 2026

github-actions bot added the infra Issues related to infrastructure label Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-enabling robust integration test. #1729

Re-enabling robust integration test. #1729

Uh oh!

clessig commented Jan 28, 2026

Uh oh!

clessig commented Jan 28, 2026

Uh oh!

clessig commented Jan 28, 2026

Uh oh!

grassesi commented Jan 28, 2026

Uh oh!

clessig commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Re-enabling robust integration test. #1729

Are you sure you want to change the base?

Re-enabling robust integration test. #1729

Uh oh!

Conversation

clessig commented Jan 28, 2026

Description

Issue Number

Checklist before asking for review

Uh oh!

clessig commented Jan 28, 2026

Uh oh!

clessig commented Jan 28, 2026

Uh oh!

grassesi commented Jan 28, 2026

Uh oh!

clessig commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants