-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calibration superfluous edges #2033
base: master
Are you sure you want to change the base?
Conversation
forecast = get_cube_from_directory(forecast_directory, cycle_point=cycle_point, max_days_offset=max_days_offset) | ||
truth = get_cube_from_directory(truth_directory, cycle_point=cycle_point, max_days_offset=max_days_offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate this is still in draft so likely you are doing a minimal change for testing. However, if/when you come closer to pulling this out of draft, we would want to ensure feature parity between the CLI and the plugin, whetehr that means changing EstimateCoefficientsForEnsembleCalibration
UI and behaviour to call get_cube_from_directory
, or otherwise defining a metaplugin to do achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about this issue, but hadn't considered a metaplugin, which I think might be the way to go. Thanks :)
ping @gavinevans - you might be interested in this. |
Thanks @SamGriffithsMO, this looks very promising 👍 |
Still wip
|
100% agree in using metadata rather information encoded in the filepath if possible. |
0bd96af6cb5c6caa045e397589dd0ce3b498af837d989fe73326f5e9459c6054 ./construct-reliability-tables/basic/forecast/forecast_0.nc | ||
fbc14286b4ce41e2e60df0870ae4911c1b00a38ec96912f43c6187fcaf7d02f6 ./construct-reliability-tables/basic/forecast/forecast_1.nc | ||
0d0edf9751a2019db952907700b02499ec9f1c360db4591a8012ca247a841c73 ./construct-reliability-tables/basic/kgo_aggregated.nc | ||
902e5cb9d3dc5d2b78bb99aff8370f9815adf5064b2caeb7abed73a56a897a43 ./construct-reliability-tables/basic/kgo_single_value_bins.nc | ||
72d4fd0655d1b7a2bc11d85741ec944f195c59813ae629e6858116c4e09eccb0 ./construct-reliability-tables/basic/kgo_without_single_value_bins.nc | ||
8ed50464c34b8673d98d1256d1c11b9eeea911dc79f7f75d425a590bf8697301 ./construct-reliability-tables/basic/truth_0.nc | ||
3999adb3749052d9efdfab863427a20a1fabbca06ff430c6c9cf5f89d1ea4d60 ./construct-reliability-tables/basic/truth_1.nc | ||
8ed50464c34b8673d98d1256d1c11b9eeea911dc79f7f75d425a590bf8697301 ./construct-reliability-tables/basic/truth/truth_0.nc | ||
3999adb3749052d9efdfab863427a20a1fabbca06ff430c6c9cf5f89d1ea4d60 ./construct-reliability-tables/basic/truth/truth_1.nc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found best practices around the acceptances tests to be fairly obscure. This change is required to match the updated expected inputs for the cli, but it is unclear to me how we would implement this change for all users? And how much of a pain it would be?
If this is too much of a pain, I could make some changes to get_cube_from_directory
|
||
if len(cubes) < 2: | ||
if verbose: | ||
print(f"Not enough files found in {directory}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it would be good to include the number of cubes that was found in this message (i.e. 1 or 0)
I don't recall improver dev. guidelines around use of warnings etc. (you know?)
@@ -1366,6 +1367,61 @@ def process( | |||
return coefficients_cubelist | |||
|
|||
|
|||
class MetaEstimateCoefficientsForEnsembleCalibration(BasePlugin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a beast of a name 😋
How about this:
class MetaEstimateCoefficientsForEnsembleCalibration(BasePlugin): | |
class MetaEnsembleCoeffEstimator(BasePlugin): |
@SamGriffithsMO, have you given thought as to how to incorporate this change? (implications for BoM, USAF etc.) |
self.distribution = distribution | ||
self.truth_attribute = truth_attribute | ||
self.cycle_point = cycle_point | ||
self.max_days_offset = max_days_offset | ||
self.point_by_point = point_by_point | ||
self.use_default_initial_guess = use_default_initial_guess | ||
self.units = units | ||
self.predictor = predictor | ||
self.tolerance = tolerance | ||
self.max_iterations = max_iterations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's good form to suffix these with an underscore to indicate 'private' (considered private by convention, not in the same sense of private in c/c++ of course).
Basically, all data should be hidden within the class. If we really want to expose this data to users, then they should be exposed via properties (setters & getters). This stops users from easily tweaking values they aren't intended to be tweaking.
self.truth_attribute = truth_attribute | ||
self.n_probability_bins = n_probability_bins | ||
self.single_value_lower_limit = single_value_lower_limit | ||
self.single_value_upper_limit = single_value_upper_limit | ||
self.cycle_point = cycle_point | ||
self.max_days_offset = max_days_offset | ||
self.aggregate_coordinates = aggregate_coordinates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per above
|
||
if forecast and truth: | ||
plugin = ConstructReliabilityCalibrationTables( | ||
# truth_attribute=self.truth_attribute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this commended arg. here for?
return MetaConstructReliabilityCalibrationTables( | ||
truth_attribute=truth_attribute, | ||
n_probability_bins=n_probability_bins, | ||
single_value_lower_limit=single_value_lower_limit, | ||
single_value_upper_limit=single_value_upper_limit, | ||
)(forecast, truth, aggregate_coordinates) | ||
aggregate_coordinates=aggregate_coordinates, | ||
cycle_point=cycle_point, | ||
max_days_offset=max_days_offset, | ||
)(forecast_directory, truth_directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, there is a lot to unpack here in terms of the implications going forwards.
Our future vision actually separates loading, polling, file identity from the IMPROVER plugins.
In this vision, I the existing IMPROVER plugins wouldn't be modified to identify the relevant files. Instead a plugin would specifically handle this and feed into the existing IMPROVER plugins.
We should chat offline.
forecast_directory: cli.inputpath, | ||
truth_directory: cli.inputpath, | ||
land_sea_mask: cli.inputcube = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
forecast_directory: cli.inputpath, | |
truth_directory: cli.inputpath, | |
land_sea_mask: cli.inputcube = None, | |
forecast_cubes: cli.calib_input_dir, | |
truth_cubes: cli.calib_input_dir, | |
land_sea_mask: cli.inputcube = None, |
predictor=predictor, | ||
tolerance=tolerance, | ||
max_iterations=max_iterations, | ||
) | ||
return plugin(forecast, truth, landsea_mask=land_sea_mask) | ||
return plugin(forecast_directory, truth_directory, land_sea_mask=land_sea_mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
return plugin(forecast_directory, truth_directory, land_sea_mask=land_sea_mask) | |
return plugin(**cubes) |
forecast_directory: cli.inputpath, | ||
truth_directory: cli.inputpath, | ||
land_sea_mask: cli.inputcube = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or actually (enabling continued CLI behaviour)
forecast_directory: cli.inputpath, | |
truth_directory: cli.inputpath, | |
land_sea_mask: cli.inputcube = None, | |
*cubes: Union[cli.calib_input_dir, cli.inputcube], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caveat to this. I'm not a fan of the use of the clize typing mechanism within IMPROVER. However, this might facilitate maintaining existing CLI functionality and behaviour while also facilitating separating 'load' from the plugin itself (useful on our journey towards our long-term goals).
Putting in to draft to address what Carwyn and I discussed offline. Move the loading to clize (more inline with other plugins), and therefore allow the plugin to run whether given specific cubes or the directories as planed here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SamGriffithsMO 👍
I'll written some comments below. I haven't totally finished but I'll post this now, as this PR has been pulled back into draft.
def get_cube_from_directory( | ||
directory, | ||
cycle_point=None, | ||
max_days_offset=None, | ||
date_format="%Y%m%dT%H%MZ", | ||
verbose=False, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you add type hints for this function?
def get_cube_from_directory(
directory: pathlib.Path,
cycle_point: Optional[str],
max_days_offset: Optional[int],
date_format: str = "%Y%m%dT%H%MZ",
verbose: bool = False,
) -> Cube:
loads and merges all netCDF files in a directory | ||
|
||
To switch on the max offset filter, both cycle_point and max_days_offset | ||
need to be provided | ||
|
||
Args: | ||
directory (pathlib.Path): | ||
The path to the directory. | ||
cycle_point (str): | ||
The cycle point of the forecast, used to filter files | ||
max_days_offset (int): | ||
Maximum number of days before cycle_point to consider files, | ||
Defined as a postive int that is subtracted from the cycle_point | ||
date_format (str): | ||
format of the cyclepoint and datetime in the filename, used by | ||
datetime.strptime | ||
verbose (bool): | ||
switch on verbose output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove the types from this docstring e.g. pathlib.Path as that should be added to the type hint, rather than here? Could you also capitalise the start of sentences and add full stops for readability?
Maximum number of days before cycle_point to consider files, | ||
Defined as a postive int that is subtracted from the cycle_point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximum number of days before cycle_point to consider files, | |
Defined as a postive int that is subtracted from the cycle_point | |
Maximum number of days before cycle_point to consider files. | |
Defined as a positive int that is subtracted from the cycle_point |
switch on verbose output | ||
|
||
Returns: | ||
Cube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add something more descriptive here? E.g. Cube created by merging all netCDF files within the directory
.
def __init__( | ||
self, | ||
distribution, | ||
truth_attribute, | ||
cycle_point=None, | ||
max_days_offset=None, | ||
point_by_point=False, | ||
use_default_initial_guess=False, | ||
units=None, | ||
predictor="mean", | ||
tolerance: float = 0.02, | ||
max_iterations: int = 1000, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could you add type hints here? These would be similar as already exist for the EstimateCoefficientsForEnsembleCalibration
plugin.
distribution, | ||
truth_attribute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This truth_attribute
variable is actually no longer required in this CLI, as the forecasts and truths are being provided by separated directories, rather than this repository needing to separate the forecasts and truths, so I think the usage of truth_attribute
in this CLI can be removed.
def __init__( | ||
self, | ||
distribution, | ||
truth_attribute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truth_attribute
isn't used, so this can be removed.
aggregate_coordinates: list = None, | ||
cycle_point: Optional[str] = None, | ||
max_days_offset: Optional[int] = None, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add docstrings for the __init__
and process
methods?
|
||
if forecast and truth: | ||
plugin = ConstructReliabilityCalibrationTables( | ||
# truth_attribute=self.truth_attribute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the truth_attribute
should be removed, as it's no longer used.
@@ -11,12 +11,16 @@ | |||
@cli.clizefy | |||
@cli.with_output | |||
def process( | |||
*cubes: cli.inputcube, | |||
truth_attribute, | |||
forecast_directory: cli.inputpath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted by @cpelley, these changes would notably impact other users that use this CLI currently. We would need to communicate this.
Addresses https://github.com/MetOffice/improver_suite/issues/1750
Description
Updates gridded calibration CLI to align itself with spot calibration, e.g. The CLI now takes a directory and it is responsible for identifying which files are required.
This is additionally driven by a desire to remove >day long offsets in the suite configurations, as these require handling before producing a Cylc graph
Related
Testing: