Skip to content

Conversation

dulte
Copy link
Collaborator

@dulte dulte commented Sep 30, 2024

Change Summary

Adds new way of defining how the climatology is defined. Instead of just a bool to turn climatology off and on, a climatology config can now be passed into the colocation setup.

This is a reimplementation of @Ovewh 's PR #1135. As pyaercom has changed as much as it has, it was easier to do everything from scratch.

Related issue number

Aims to fix #1125

Checklist

  • Start with a draft-PR
  • The PR title is a good summary of the changes
  • PR is set to AeroTools and a tentative milestone
  • Documentation reflects the changes where applicable
  • Tests for the changes exist where applicable
  • Tests pass locally
  • Tests pass on CI
  • At least 1 reviewer is selected
  • Make PR ready to review

@dulte dulte self-assigned this Sep 30, 2024
@dulte
Copy link
Collaborator Author

dulte commented Sep 30, 2024

Two tests fail due to different mean obs than expected with new climatology.

@dulte dulte added bug 🐛 Something isn't working enhancement ✨ New feature or request labels Sep 30, 2024
@dulte dulte added this to the m2024-11 milestone Sep 30, 2024
@Ovewh
Copy link
Collaborator

Ovewh commented Sep 30, 2024

Two tests fail due to different mean obs than expected with new climatology.

I remember climatology previously was hard coded, in the pyaerocom const to be defined as the mean between 2005- 2015, so if you use different period it might cause the test failing.

The calc_climatology function in helpers.py will set the year of the climatology unless defined (the set_year keyword), to be the middle of the the period which means that unless the year of the model data you want to compare is also that year the collocation will fail. This also causes issues if you like extend the period for which the climatology is defined i.e. from default of 2005-2015 to being 2005 - 2020. Then the time dimension of the gridded data object also would have to update if the collocation is to succeed.


resample_how: str = const.CLIM_RESAMPLE_HOW
freq: str = const.CLIM_FREQ
mincount: dict = const.CLIM_MIN_COUNT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think set_year should also be an attribute here, or if you have a better solution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done

stop=use_climatology_ref.stop,
clim_mincount=use_climatology_ref.mincount,
resample_how=use_climatology_ref.resample_how,
clim_freq=use_climatology_ref.freq,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here set_year should be set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 771 to 774
if isinstance(use_climatology_ref, ClimatologyConfig): # pragma: no cover
col_freq = "monthly" # use_climatology_ref.freq
obs_start = use_climatology_ref.start
obs_stop = use_climatology_ref.stop
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an old hack or something(?). Default const.CLIM_FREQ is daily, and is thus used everywhere climatology is calculated. But the col freq is set as monthly if climatology is used. Why?

And changing col_freq in the colocate_gridded_ungridded, is that smart? Then the value from colocation config is no longer immutable(?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we talked about, we should validate this asap, in the colocation_setup, so that no data is read before this change.

Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 86.66667% with 6 lines in your changes missing coverage. Please review.

Project coverage is 77.96%. Comparing base (c52cd91) to head (901272c).
Report is 762 commits behind head on main-dev.

Files with missing lines Patch % Lines
pyaerocom/stationdata.py 50.00% 3 Missing ⚠️
pyaerocom/aeroval/_processing_base.py 50.00% 1 Missing ⚠️
pyaerocom/climatology_config.py 94.11% 1 Missing ⚠️
pyaerocom/colocation/colocation_setup.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           main-dev    #1356      +/-   ##
============================================
+ Coverage     77.95%   77.96%   +0.01%     
============================================
  Files           144      145       +1     
  Lines         21563    21602      +39     
============================================
+ Hits          16810    16843      +33     
- Misses         4753     4759       +6     
Flag Coverage Δ
unittests 77.96% <86.66%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

How to resample the climatology. Must be mean or median.
freq : str, optional
Which frequency the climatology should have
mincount : dict, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest min_count. Also maybe worth building a field_validator for this? Or possibly a child class of TypedDict

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this module live in pyaerocom/aeroval? I don't really see this being used in the core API

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, it is passed though the colocator, meaning that it is part of the core. If this is moved from the core to aeroval, then some more work is needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case then, no, it should stay here in the core. We don't want to have pyaerocom depend on pyaeroval after all.

Comment on lines 771 to 774
if isinstance(use_climatology_ref, ClimatologyConfig): # pragma: no cover
col_freq = "monthly" # use_climatology_ref.freq
obs_start = use_climatology_ref.start
obs_stop = use_climatology_ref.stop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we talked about, we should validate this asap, in the colocation_setup, so that no data is read before this change.

@lewisblake lewisblake modified the milestones: m2024-11, m2024-12 Oct 11, 2024
@lewisblake lewisblake modified the milestones: m2024-12, m2025-01 Dec 2, 2024
@lewisblake lewisblake modified the milestones: m2025-01, m2025-02 Jan 6, 2025
@jgriesfeller
Copy link
Member

The hours for this effort can be put on Arbeitsordre 285056000 (NFR- INES2 project) according to @MichaelSchulzMETNO

@heikoklein heikoklein modified the milestones: m2025-02, m2025-03 Feb 7, 2025
@dulte dulte requested a review from jgriesfeller February 14, 2025 12:51
@dulte dulte marked this pull request as ready for review February 14, 2025 12:51
Copy link
Member

@jgriesfeller jgriesfeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@dulte dulte merged commit dd36d59 into main-dev Feb 25, 2025
8 checks passed
@dulte dulte deleted the fix-1125 branch February 25, 2025 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug 🐛 Something isn't working enhancement ✨ New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

use_obs_clim is broken and has always been: Turns all obsdata into nan

5 participants