-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of new climatology config #1356
base: main-dev
Are you sure you want to change the base?
Changes from all commits
44f0c88
f903e94
ef5cc9f
2b5c800
7cd2dc5
d35a7d0
bbc313d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
from pydantic import BaseModel, ValidationError, field_validator | ||
|
||
from typing import Literal | ||
|
||
from pyaerocom import const | ||
|
||
|
||
class ClimatologyConfig(BaseModel): | ||
""" | ||
Holds the configuration for the climatology | ||
|
||
Attributes | ||
------------- | ||
start : int, optional | ||
Start year of the climatology | ||
stop : int, optional | ||
Stop year of the climatology | ||
resample_how : str, optional | ||
How to resample the climatology. Must be mean or median. | ||
freq : str, optional | ||
Which frequency the climatology should have | ||
mincount : dict, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest |
||
Number of values should be present for the data to be used in the climatology. | ||
Dict where freqs are the keys and the count is the values | ||
|
||
""" | ||
|
||
start: int = const.CLIM_START | ||
stop: int = const.CLIM_STOP | ||
|
||
set_year: int | None = None | ||
|
||
@field_validator("set_year") | ||
@classmethod | ||
def validate_set_year(cls, v): | ||
if v is None: | ||
return int((cls.stop - cls.start) // 2 + cls.start) + 1 | ||
|
||
if v > cls.stop or v < cls.start: | ||
raise ValidationError | ||
|
||
return v | ||
|
||
resample_how: Literal["mean", "median"] = const.CLIM_RESAMPLE_HOW | ||
freq: str = const.CLIM_FREQ | ||
mincount: dict = const.CLIM_MIN_COUNT |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ | |
|
||
from pyaerocom import __version__ as pya_ver | ||
from pyaerocom import const | ||
from pyaerocom.climatology_config import ClimatologyConfig | ||
from pyaerocom._lowlevel_helpers import RegridResDeg | ||
from pyaerocom.exceptions import ( | ||
DataUnitError, | ||
|
@@ -427,8 +428,9 @@ def _colocate_site_data_helper( | |
to aggregate from hourly to daily, rather than the mean. | ||
min_num_obs : int or dict, optional | ||
minimum number of observations for resampling of time | ||
use_climatology_ref : bool | ||
if True, climatological timeseries are used from observations | ||
use_climatology_ref : ClimateConfig | bool, optional | ||
If provided, the climatology will be calculated from the config | ||
|
||
|
||
Raises | ||
------ | ||
|
@@ -448,8 +450,17 @@ def _colocate_site_data_helper( | |
var, ts_type=ts_type, how=resample_how, min_num_obs=min_num_obs, inplace=True | ||
)[var] | ||
|
||
if use_climatology_ref: | ||
obs_ts = stat_data_ref.calc_climatology(var_ref, min_num_obs=min_num_obs)[var_ref] | ||
if isinstance(use_climatology_ref, ClimatologyConfig): | ||
obs_ts = stat_data_ref.calc_climatology( | ||
var_ref, | ||
start=use_climatology_ref.start, | ||
stop=use_climatology_ref.stop, | ||
min_num_obs=min_num_obs, | ||
clim_mincount=use_climatology_ref.mincount, | ||
resample_how=use_climatology_ref.resample_how, | ||
clim_freq=use_climatology_ref.freq, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
set_year=use_climatology_ref.set_year, | ||
)[var_ref] | ||
else: | ||
obs_ts = stat_data_ref.resample_time( | ||
var_ref, | ||
|
@@ -501,26 +512,26 @@ def _colocate_site_data_helper_timecol( | |
to aggregate from hourly to daily, rather than the mean. | ||
min_num_obs : int or dict, optional | ||
minimum number of observations for resampling of time | ||
use_climatology_ref : bool | ||
if True, NotImplementedError is raised | ||
use_climatology_ref: ClimateConfig | bool | ||
if provided, NotImplementedError is raised | ||
|
||
Raises | ||
------ | ||
TemporalResolutionError | ||
if model or obs sampling frequency is lower than desired output frequency | ||
NotImplementedError | ||
if input arg `use_climatology_ref` is True. | ||
if input arg `use_climatology_ref` is provided. | ||
|
||
Returns | ||
------- | ||
pandas.DataFrame | ||
dataframe containing the colocated input data (column names are | ||
data and ref) | ||
""" | ||
if use_climatology_ref: | ||
if isinstance(use_climatology_ref, ClimatologyConfig): | ||
raise NotImplementedError( | ||
"Using observation climatology in colocation with option " | ||
"colocate_time=True is not available yet ..." | ||
"use_climatology_ref is not available yet ..." | ||
) | ||
|
||
grid_tst = stat_data.get_var_ts_type(var) | ||
|
@@ -672,8 +683,8 @@ def colocate_gridded_ungridded( | |
if True and if original time resolution of data is higher than desired | ||
time resolution (`ts_type`), then both datasets are colocated in time | ||
*before* resampling to lower resolution. | ||
use_climatology_ref : bool | ||
if True, climatological timeseries are used from observations | ||
use_climatology_ref : ClimateConfig | bool, optional. | ||
Configuration for calculating the climatology. If set to a bool, this will not be done | ||
resample_how : str or dict | ||
string specifying how data should be aggregated when resampling in time. | ||
Default is "mean". Can also be a nested dictionary, e.g. | ||
|
@@ -757,10 +768,10 @@ def colocate_gridded_ungridded( | |
data = data.resample_time(str(ts_type), min_num_obs=min_num_obs, how=resample_how) | ||
ts_type_data = ts_type | ||
|
||
if use_climatology_ref: | ||
col_freq = "monthly" | ||
obs_start = const.CLIM_START | ||
obs_stop = const.CLIM_STOP | ||
if isinstance(use_climatology_ref, ClimatologyConfig): # pragma: no cover | ||
col_freq = "monthly" # use_climatology_ref.freq | ||
obs_start = use_climatology_ref.start | ||
obs_stop = use_climatology_ref.stop | ||
Comment on lines
+771
to
+774
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like an old hack or something(?). Default const.CLIM_FREQ is daily, and is thus used everywhere climatology is calculated. But the col freq is set as monthly if climatology is used. Why? And changing col_freq in the colocate_gridded_ungridded, is that smart? Then the value from colocation config is no longer immutable(?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like we talked about, we should validate this asap, in the colocation_setup, so that no data is read before this change. |
||
else: | ||
col_freq = str(ts_type) | ||
obs_start = start | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this module live in
pyaerocom/aeroval
? I don't really see this being used in the core APIThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, it is passed though the colocator, meaning that it is part of the core. If this is moved from the core to aeroval, then some more work is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case then, no, it should stay here in the core. We don't want to have pyaerocom depend on pyaeroval after all.