Skip to content

Conversation

@gabrieloks
Copy link
Contributor

Description

Working fix for loss filtering with target variables

What problem does this change solve?

What issue or task does this change relate to?

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@HCookie HCookie changed the title [FIX] fix: filteringlosswrapper fix: filteringlosswrapper with target variables Jan 28, 2026
@HCookie HCookie moved this from To be triaged to Now In Progress in Anemoi-dev Jan 28, 2026
Comment on lines +870 to +872
if y_postprocessed.shape[3] != y_pred_postprocessed.shape[3]: # maybe 3 shouldn't be hardcoded

y_postprocessed = y_postprocessed[..., self.mask_target]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not, is it to index the variable dimension? If so we have an enum for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Var dim could also just be -1, this one rarely changes

@OpheliaMiralles
Copy link
Contributor

I am not sure to be a fan of the idea of inserting an additional indexing layer when all losses can be wrapped by default with predicted indices by the model. I think if this is an additional flexibility to compare predicted variables to any variable in the input data it should be somehow embedded in the way we compute the loss and not a quick fix at the task level. Are you sure there is no way to make it work in a closer way to the original PR?

@gabrieloks
Copy link
Contributor Author

I am not sure to be a fan of the idea of inserting an additional indexing layer when all losses can be wrapped by default with predicted indices by the model. I think if this is an additional flexibility to compare predicted variables to any variable in the input data it should be somehow embedded in the way we compute the loss and not a quick fix at the task level. Are you sure there is no way to make it work in a closer way to the original PR?

Thanks for the feedback. I understand the concern about quick fixes at the task level, but I think having a working solution with good test coverage is a reasonable first step. It makes any deeper refactoring, if needed, safer to attempt in the future.

The issue isn't really limited to the losses, it's more about whether the target gets reduced from the start or not. The architecture seems to have been designed with early reduction in mind, and several things break without it. I already had to fix the scalers and normalizers to handle the potential shape discrepancy between targets and predictions when target-only variables are used, so I preferred to keep things on the safe side rather than changing the core behaviour. So I would prefer not to reduce the targets, but if we can show that the scalers, normalizers and validation can work without big changes, even if the targets are not reduced, I'm down for it. I would prefer more a surgical approach here. The idea I had here was to try and fix the filtering but minimising the changes everywhere else.

The reindexing isn't really an additional layer, it's just accounting for how target variables get reduced via [data_indices.data.output.full]. When you have original indices like [0, 1, 2, 3, 8, ...] after reduction, they need to be mapped to sequential positions [0, 1, 2, 3, 4, ...] to match the actual tensor shape. That's all this does.

The tests I added document the expected behaviour, but I'm open to discussion. Also if you think there's an easier way of doing this that I might have missed, feel free to go with it

@OpheliaMiralles
Copy link
Contributor

I am not sure to be a fan of the idea of inserting an additional indexing layer when all losses can be wrapped by default with predicted indices by the model. I think if this is an additional flexibility to compare predicted variables to any variable in the input data it should be somehow embedded in the way we compute the loss and not a quick fix at the task level. Are you sure there is no way to make it work in a closer way to the original PR?

Thanks for the feedback. I understand the concern about quick fixes at the task level, but I think having a working solution with good test coverage is a reasonable first step. It makes any deeper refactoring, if needed, safer to attempt in the future.

The issue isn't really limited to the losses, it's more about whether the target gets reduced from the start or not. The architecture seems to have been designed with early reduction in mind, and several things break without it. I already had to fix the scalers and normalizers to handle the potential shape discrepancy between targets and predictions when target-only variables are used, so I preferred to keep things on the safe side rather than changing the core behaviour. So I would prefer not to reduce the targets, but if we can show that the scalers, normalizers and validation can work without big changes, even if the targets are not reduced, I'm down for it. I would prefer more a surgical approach here. The idea I had here was to try and fix the filtering but minimising the changes everywhere else.

The reindexing isn't really an additional layer, it's just accounting for how target variables get reduced via [data_indices.data.output.full]. When you have original indices like [0, 1, 2, 3, 8, ...] after reduction, they need to be mapped to sequential positions [0, 1, 2, 3, 4, ...] to match the actual tensor shape. That's all this does.

The tests I added document the expected behaviour, but I'm open to discussion. Also if you think there's an easier way of doing this that I might have missed, feel free to go with it

Of course I did not mean to be dismissive of your changes or anything, I’m actually glad someone cares about this :). I’d just be grateful if you could send the exact config you are using in a flatten format by dm on slack and I’ll try to reproduce the issue see if I can see why wrapping at the loss level is trickier than it looks (I can use different data no problem). If your changes are actually more minimal and they pass, we would go your way. Is this ok for you?

@gabrieloks
Copy link
Contributor Author

gabrieloks commented Jan 29, 2026

I am not sure to be a fan of the idea of inserting an additional indexing layer when all losses can be wrapped by default with predicted indices by the model. I think if this is an additional flexibility to compare predicted variables to any variable in the input data it should be somehow embedded in the way we compute the loss and not a quick fix at the task level. Are you sure there is no way to make it work in a closer way to the original PR?

Thanks for the feedback. I understand the concern about quick fixes at the task level, but I think having a working solution with good test coverage is a reasonable first step. It makes any deeper refactoring, if needed, safer to attempt in the future.
The issue isn't really limited to the losses, it's more about whether the target gets reduced from the start or not. The architecture seems to have been designed with early reduction in mind, and several things break without it. I already had to fix the scalers and normalizers to handle the potential shape discrepancy between targets and predictions when target-only variables are used, so I preferred to keep things on the safe side rather than changing the core behaviour. So I would prefer not to reduce the targets, but if we can show that the scalers, normalizers and validation can work without big changes, even if the targets are not reduced, I'm down for it. I would prefer more a surgical approach here. The idea I had here was to try and fix the filtering but minimising the changes everywhere else.
The reindexing isn't really an additional layer, it's just accounting for how target variables get reduced via [data_indices.data.output.full]. When you have original indices like [0, 1, 2, 3, 8, ...] after reduction, they need to be mapped to sequential positions [0, 1, 2, 3, 4, ...] to match the actual tensor shape. That's all this does.
The tests I added document the expected behaviour, but I'm open to discussion. Also if you think there's an easier way of doing this that I might have missed, feel free to go with it

Of course I did not mean to be dismissive of your changes or anything, I’m actually glad someone cares about this :). I’d just be grateful if you could send the exact config you are using in a flatten format by dm on slack and I’ll try to reproduce the issue see if I can see why wrapping at the loss level is trickier than it looks (I can use different data no problem). If your changes are actually more minimal and they pass, we would go your way. Is this ok for you?

Yeah sure :) I committed to original fix/filtering... branch the batch of tests I have working on this second branch. You could argue that most of the tests are secondary but test_forward_multiple_vars_and_target_only. This is really similar to what I try to do with era5+imerg. In this test most of the asserts assume that the target is reduced, but they can be commented. The most important part is getting the final assert on the value of the loss right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Now In Progress

Development

Successfully merging this pull request may close these issues.

4 participants