Releases: nci/scores
2.3.0
Release Notes (What's New)
Version 2.3.0 (October 14, 2025)
For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.
Features
- Added a new metric:
- Percent within X:
scores.continuous.percent_within_x. See PR #865.
- Percent within X:
- Added one new metric and two supporting functions. Following the publication of Taggart & Wilke (2025), these have been moved from
scores.emergingtoscores.categorical:- Risk matrix score:
scores.categorical.risk_matrix_score. - Risk matrix score - matrix weights to array:
scores.categorical.matrix_weights_to_array. - Risk matrix score - warning scaling to weight array:
scores.categorical.weights_from_warning_scaling.
Note: while removing the functions fromscores.emergingis technically a breaking change, breaking changes that only impact the "emerging" section of the API do not trigger major releases. This is because the "emerging" section of the API is designed to hold metrics while they are undergoing peer review and it is expected they will be moved out of "emerging" once peer review has concluded.
See PR #904.
- Risk matrix score:
- Updated the weighting method used by all
scoresfunctions that allow the user to supply weights. The updated weighting method normalises the user-supplied weights rather than applying them directly. While both approaches can be valid, the revised approach is more in keeping with general expectations and is conistent with the default approach taken by other libraries. As a part of this change, users can no longer supply weights that contain NaNs (zeroes may be used instead where appropriate). The "Introduction to weighting and masking" tutorial has been updated and substantially expanded to explain what the weighting does mathematically. See PR #899. - Added optional automatic generation of thresholds for the receiver (relative) operating characteristic (ROC) curve (
scores.probability.roc_curve_data). See PR #882.
Bug Fixes
- Updated
scores.continuous.quantile_interval_scoreso it now recognisespreserve_dims='all'. Beforehand, it was not recognising the special case ofpreserve_dims='all'and was raising an error unless a list of dimensions was supplied. (Note: the score calculations were not incorrect, it was only thatpreserve_dims='all'was not recognised.) See PR #893.
Documentation
- Added "Percent Within X" tutorial. See PR #865.
- Substantially updated and expanded the "Introduction to weighting and masking" tutorial, following changes to the weighting method used by all
scoresfunctions that allow the user to supply weights. The updated and expanded tutorial explains what the weighting does mathematically. See PR #899. - Updated the "Quantile-Quantile (Q-Q) Plots for Comparing Forecasts and Observations" tutorial so that the plots render in Read the Docs. See PR #883.
- Updated the description of the second figure in the "Threshold Weighted Continuous Ranked Probability Score (twCRPS) for ensembles" tutorial. See PR #897.
- Updated multiple sections of the documentation following the risk matrix score moving from
scores.emergingtoscores.categorical, including:- updating docstrings and
docs/included.md, - updating the tutorial with the new
categoricalmethods, and - updating references in several sections of the documentation, following the publication of Taggart & Wilke (2025).
See PR #904.
- updating docstrings and
- Updated several tutorials to subtract the
LEAD_TIMETimedelta from the base times in the forecast data to make the forecast and observation data line up correctly. See PR #920. - In the README, "Detailed Installation Guide" and "Contributing Guide", updated pip install commands to use quotation marks where square brackets are used to specify optional dependencies. This is to ensure compatibility with zsh (the default on macOS) while still working as expected on bash. See PR #917.
- Added thumbnail images to multiple entries in the tutorial gallery. See PR #874, PR #875, PR #877, PR #879, PR #880, PR #881 and PR #884.
Internal Changes
- In multiple tutorials, added the keyword argument
decode_timedelta=Truetoxarray.open_datasetfor the downloaded filesforecast_grid.ncandanalysis_grid.nc. See PR #894. - Perform input checking earlier in various function calls to improve efficiency, so that error messages can be raised before incurring computational expenses. See PR #905.
Contributors to this Release
Thomas C. Pagano* (@thomaspagano), Paul R. Smith* (@prs247au), J. Smallwood* (@jdgsmallwood), Tennessee Leeuwenburg (@tennlee), Nicholas Loveday (@nicholasloveday), Nikeeth Ramanathan (@nikeethr), Stephanie Chong (@Steph-Chong), Robert J. Taggart (@rob-taggart) and Mohammadreza Khanarmuei (@reza-armuei).
* indicates that this release contains their first contribution to scores.
2.2.0
Release Notes (What's New)
Version 2.2.0 (July 26, 2025)
For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.
Features
- Added a new metric:
- Spearman’s correlation coefficient:
scores.continuous.correlation.spearmanr. See PR #773.
- Spearman’s correlation coefficient:
- Added a new function for generating data for diagrams:
- Quantile-Quantile (QQ) plots:
scores.plotdata.qq. See PR #852.
- Quantile-Quantile (QQ) plots:
- Added new features to the FIxed Risk Multicategorical (FIRM) score (
scores.categorical.firm):- Added support for xr.Datasets in addition to the existing support for xr.DataArrays. See PR #853.
- Added the optional argument
include_components. Ifinclude_componentsis set toTruethe function will return the overforecast and underforecast penalties along with the FIRM score.
See See PR #853 and PR #864.
- Added a new
scores.plotdatasection to the API for functions that generate data for verification plots. See PR #852.
Bug Fixes
- Fixed an issue where
scores.plotdata.rocdidn't add the point (0, 0) in some instances. See PR #863. - Fixed an issue in
scores.continuous.quantile_interval_scorewhere broadcasting wasn't being done correctly in some cases. See PR #867.
Documentation
- Added two new tutorials:
- Substantially updated "The FIxed Risk Multicategorical (FIRM) Score" tutorial. See PR #853.
- Fixed an error in the formula in the docstring for the quantile interval score (
scores.continuous.quantile_interval_score). (Note: this error was only present in the docstring - the code implemenation of the function was correct and the tutorial listed the correct formula.) See PR #851. - Updated several "full changelog" URLs in the release notes. See PR #859.
Internal Changes
- Improved the efficiency of the FIxed Risk Multicategorical (FIRM) score (
scores.categorical.firm) by moving the call to gather dimensions to earlier within the method. See PR #853. - Added a new
scores.plotdatasection to the API for functions that generate data for verification plots. See PR #852. The following internal changes were made:- Receiver (Relative) Operating Characteristic (ROC):
scores.probability.roc_curve_datawas moved toscores.plotdata.roc, but can still be imported asscores.probability.roc_curve_data.
- Murphy Score:
scores.continuous.murphy_scorewas moved toscores.plotdata.murphy_score, but can still be imported asscores.continuous.murphy_scoreandscores.probability.murphy_score.scores.continuous.murphy_thetaswas moved toscores.plotdata.murphy_thetas, but can still be imported asscores.continuous.murphy_thetasandscores.probability.murphy_thetas.
- Receiver (Relative) Operating Characteristic (ROC):
- Added an additional CI/CD pipeline for testing without Dask. See PR #856.
Contributors to this Release
Liam Bluett (@lbluett), Nicholas Loveday (@nicholasloveday), Nikeeth Ramanathan (@nikeethr), Tennessee Leeuwenburg (@tennlee), Robert J. Taggart (@rob-taggart), Stephanie Chong (@Steph-Chong) and Mohammadreza Khanarmuei (@reza-armuei).
2.1.0
Release Notes (What's New)
Version 2.1.0 (April 30, 2025)
For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.
Features
- Added a new fuction:
- Block bootstrap:
scores.processing.block_bootstrap. See PR #418.
- Block bootstrap:
- Added two new metrics:
Documentation
- Added "Block Bootstrapping" tutorial. See PR #418.
- Added "Stable Equitable Error in Probability Space (SEEPS)" tutorial. See PR #809.
- Added "Nash-Sutcliffe Efficiency (NSE)" tutorial. See PR #815.
- Updated the "Continuous Ranked Probability Score (CRPS) for Ensembles" tutorial:
- Labelled dimensions in fcst/obs data.
- Updated description of the plot to say the area squared corresponds to the CRPS.
- Added an example with multiple coordinates along a dimension.
See PR #805.
- Updated "Data Sources":
- Updated references in several sections of the documentation, following the publication of a preprint for the risk matrix score. See PR #827.
Internal Changes
- Tested and added compatibility for recent Xarray versions (2025 and onwards) and adjusted dependency specification so new year "major version" rollovers will be permitted by default in future. See commit #f109f2f and commit #8428d64.
- In
scores.emerging.weights_from_warning_scaling, changed the name of the argumentassessment_weightstoevaluation_weights. See PR #806.
Note: This is technically a breaking change, but does not trigger a major release as it is contained within the "emerging" section of the API. This area of the API is designated for metrics which are still undergoing peer review and as such are expected to undergo change. Once peer review is concluded, the implementation will be finalised and moved. - Add support for developers of
scoreswho choose to use thepixitool for environment management. See PR #835, PR #839 and PR #840.
Contributors to this Release
Dougal T. Squire* (@dougiesquire), Mohammad Mahadi Hasan* (@engrmahadi), Mohammadreza Khanarmuei (@reza-armuei), Nikeeth Ramanathan (@nikeethr) Tennessee Leeuwenburg (@tennlee), Nicholas Loveday (@nicholasloveday), Robert J. Taggart (@rob-taggart), Durga Shrestha (@durgals) and Stephanie Chong (@Steph-Chong).
* indicates that this release contains their first contribution to scores.
2.0.0
Release Notes (What's New)
Version 2.0.0 (December 7, 2024)
For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.
Breaking Changes
- The function
scores.probability.tw_crps_for_ensemblepreviously took an optional (mis-spelled) argumentchainging_func_kwargs. The spelling has been corrected and the argument is nowchaining_func_kwargs. See PR #780 and PR #772. - For those who develop on
scores, you will need to update your installation of thescorespackage withpip install -e .[all], to get updated versions ofblack,pylintandmypy. See PR #768, PR #769 and PR #771.
Features
- Added three new metrics:
- Brier score for ensembles:
scores.probability.brier_score_for_ensemble. See PR #735. - Negative predictive value:
scores.categorical.BasicContingencyManager.negative_predictive_value. See PR #759. - Positive predictive value:
scores.categorical.BasicContingencyManager.positive_predictive_value. See PR #761 and PR #756.
- Brier score for ensembles:
- Also added one new emerging metric and two supporting functions:
- A new method called
format_tablewas added to the classBasicContingencyManagerto improve visualisation of 2x2 contingency tables. The tutorialBinary_Contingency_Scoreswas updated to demonstrate the use of this function. See PR #775. - The functions
scores.processing.comparative_discretise,scores.processing.binary_discretiseandscores.processing.binary_discretise_proportionnow accept either a string indicating the choice of operator to be used, or an operator from the Python core libraryoperatormodule. Using one of the operators from the Python core module is recommended, as doing so is more reliable for a variety of reasons. Support for the use of a string may be removed in future. See PR #740 and PR #758.
Documentation
- Added "The Risk Matrix Score" tutorial. See PR #724 and PR #794.
- Updated the "Brier Score" tutorial to include a new section about the Brier score for ensembles. See PR #735.
- Updated the "Binary Categorical Scores and Binary Contingency Tables (Confusion Matrices)"
tutorial: - Updated the “Contributing Guide”:
- Added a new section: "Creating Your Own Fork of
scoresfor the First Time". - Updated the section: "Workflow for Submitting Pull Requests".
- Added a new section: "Pull Request Etiquette".
See PR #787.
- Added a new section: "Creating Your Own Fork of
- Updated the README:
- Added
Scoringrulesto "Related Works". See PR #746, PR #766 and PR #789.
Internal Changes
- Removed scikit-learn as a dependency.
scoreshas replaced the use of scikit-learn with a similar function from SciPy (which was an existingscoresdependency). This change was manually tested and found to be faster. See PR #774. - Version pinning of dependencies in release files (the wheel and sdist files used by PyPI and conda-forge) is now managed and set by the
hatch_buildscript. This allows development versions to be free-floating, while being more specific about dependencies in releases. The previous process also aimed to do this, but was error-prone. A new entry calledpinned_dependencieswas added to pyproject.toml to specify the release dependencies. See PR #760.
Contributors to this Release
Arshia Sharma* (@arshiaar), A.J. Fisher* (@AJTheDataGuy), Liam Bluett* (@lbluett), Jinghan Fu* (@JinghanFu), Sam Bishop* (@techdragon), Robert J. Taggart (@rob-taggart), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Nicholas Loveday (@nicholasloveday).
* indicates that this release contains their first contribution to scores.
1.3.0
Release Notes (What's New)
Version 1.3.0 (November 15, 2024)
For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.
Introduced Support for Python 3.13 and Dropped Support for Python 3.9
- In line with other scientific Python packages,
scoreshas dropped support for Python 3.9 in this release.
scoreshas added support for Python 3.13. See PR #710.
Features
- Added four new metrics:
- Quantile Interval Score:
scores.continuous.quantile_interval_score. See PR #704, PR #733 and PR #738. - Interval Score:
scores.continuous.interval_score. See PR #704, PR #733 and PR #738. - Kling-Gupta Efficiency (KGE):
scores.continuous.kge. See PR #679, PR #700 and PR #734. - Interval threshold weighted continuous ranked probability score (twCRPS) for ensembles:
scores.probability.interval_tw_crps_for_ensemble. See PR #682 and PR #734.
- Quantile Interval Score:
- Added an optional
include_componentsargument to several continuous ranked probability score (CRPS) functions for ensembles. If supplied, theinclude_componentsargument will return the underforecast penalty, the overforecast penalty and the forecast spread term, in addition to the overall CRPS value. This applies to the following CRPS functions:- continuous ranked probability score (CRPS) for ensembles:
scores.probability.crps_for_ensemble - threshold weighted continuous ranked probability score (twCRPS) for ensembles:
scores.probability.tw_crps_for_ensemble - tail threshold weighted continuous ranked probability score (twCRPS) for ensembles:
scores.probability.tail_tw_crps_for_ensemble - interval threshold weighted continuous ranked probability score (twCRPS) for ensembles:
scores.probability.interval_tw_crps_for_ensemble)
See PR #708 and PR #734.
- continuous ranked probability score (CRPS) for ensembles:
Documentation
- Added "Kling–Gupta Efficiency (KGE)" tutorial. See PR #679, PR #700 and PR #734.
- Added "Quantile Interval Score and Interval Score" tutorial. See PR #704, PR #736 and PR #738.
- Added "Threshold Weighted Continuous Ranked Probability Score (twCRPS) for ensembles" tutorial. See PR #706 and PR #722.
- Updated the title in the "Binary Categorical Scores and Binary Contingency Tables (Confusion Matrices)" tutorial and the description for the corresponding thumbnail in the tutorial gallery. See PR #741 and PR #743.
- Updated the pull request template. See PR #719.
Internal Changes
- Sped up (improved the computational efficiency of) the continuous ranked probability score (CRPS) for ensembles. This also addresses memory issues when a large number of ensemble members are present. See PR #694.
Contributors to this Release
Mohammadreza Khanarmuei (@reza-armuei), Nicholas Loveday (@nicholasloveday), Durga Shrestha (@durgals), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Robert J. Taggart (@rob-taggart).
1.2.0
Release Notes (What's New)
Version 1.2.0 (September 13, 2024)
For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.
Features
- Added three new metrics:
- Percent bias (PBIAS):
scores.continuous.pbias. See PR #639 and PR #655. - Threshold weighted continuous ranked probability score (twCRPS) for ensembles:
scores.probability.tw_crps_for_ensemble. See PR #644. - Tail threshold weighted continuous ranked probability score (twCRPS) for ensembles:
scores.probability.tail_tw_crps_for_ensemble. See PR #644.
- Percent bias (PBIAS):
- The FIxed Risk Multicategorical (FIRM) score (
scores.categorical.firm) can now take a sequence of mulitdimensional arrays (xr.DataArray) of thresholds. This allows the FIRM score to be used with categorical thresholds that vary across the domain. See PR #661.
Documentation
- Added information about percent bias to the "Additive Bias and Multiplicative Bias" tutorial. See PR #639 and PR #656.
- Updated documentation to say there are now over 60 metrics, statistical techniques and data processing tools contained in
scores. See PR #659. - In the "Contributing Guide", updated instructions for installing a conda-based virtual environment. See PR #654.
Internal Changes
- Modified automated tests to work with NumPy 2.1. Incorporated a union type of
arrayandgenericin assert statements for Dask operations. See PR #643.
Contributors to this Release
Durga Shrestha* (@durgals), Maree Carroll (@mareecarroll), Nicholas Loveday (@nicholasloveday), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Robert J. Taggart (@rob-taggart).
* indicates that this release contains their first contribution to scores.
1.1.0
Release Notes (What's New)
Version 1.1.0 (August 9, 2024)
For a list of all changes in this release, see the full changelog. Below are the changes we think users may wish to be aware of.
Features
scoresis now available on conda-forge.- Added five new metrics
- threshold weighted squared error:
scores.continuous.tw_squared_error - threshold weighted absolute error:
scores.continuous.tw_absolute_error - threshold weighted quantile score:
scores.continuous.tw_quantile_score - threshold weighted expectile score:
scores.continuous.tw_expectile_score - threshold weighted Huber loss:
scores.continuous.tw_huber_loss.
See PR #609.
- threshold weighted squared error:
Documentation
- Added "Threshold Weighted Scores" tutorial. See PR #609.
- Removed nbviewer link from documentation. See PR #615.
Internal Changes
- Modified
numpy.trapezoidcall to work with either NumPy 1 or 2. See PR #610.
Contributors to this Release
Nicholas Loveday (@nicholasloveday), Tennessee Leeuwenburg (@tennlee), Stephanie Chong (@Steph-Chong) and Robert J. Taggart (@rob-taggart).
1.0.0
Release Notes (What's New)
Version 1.0.0 (July 10, 2024)
We are happy to have reached the point of releasing “Version 1.0.0” of scores. While we look forward to many version increments to come, version 1.0.0 represents a milestone. It signifies a stabilisation of the API, and marks a turning point from the initial construction period. We have also published a paper in the Journal of Open Source Software (see citation further below).
From this point forward, scores will be following the Semantic Versioning Specification (SemVer) in its release management.
This is a good moment to acknowledge and thank the contributors that helped us reach this point. They are: Tennessee Leeuwenburg, Nicholas Loveday, Elizabeth E. Ebert, Harrison Cook, Mohammadreza Khanarmuei, Robert J. Taggart, Nikeeth Ramanathan, Maree Carroll, Stephanie Chong, Aidan Griffiths and John Sharples.
Please consider a citation of our paper if you use our code. The citation is:
Leeuwenburg, T., Loveday, N., Ebert, E. E., Cook, H., Khanarmuei, M., Taggart, R. J., Ramanathan, N., Carroll, M., Chong, S., Griffiths, A., & Sharples, J. (2024). scores: A Python package for verifying and evaluating models and predictions with xarray. Journal of Open Source Software, 9(99), 6889. https://doi.org/10.21105/joss.06889
BibTeX:
@article{Leeuwenburg_scores_A_Python_2024,
author = {Leeuwenburg, Tennessee and Loveday, Nicholas and Ebert, Elizabeth E. and Cook, Harrison and Khanarmuei, Mohammadreza and Taggart, Robert J. and Ramanathan, Nikeeth and Carroll, Maree and Chong, Stephanie and Griffiths, Aidan and Sharples, John},
doi = {10.21105/joss.06889},
journal = {Journal of Open Source Software},
month = jul,
number = {99},
pages = {6889},
title = {{scores: A Python package for verifying and evaluating models and predictions with xarray}},
url = {https://joss.theoj.org/papers/10.21105/joss.06889},
volume = {9},
year = {2024}
}
For the full details of all changes in this release, see the GitHub commit history.
0.9.3
Release Notes (What's New)
Version 0.9.3 (July 9, 2024)
For the full details of all changes in this release, see the GitHub commit history. Below are the changes we think users may wish to be aware of.
Breaking Changes
- Renamed and relocated function
scores.continuous.correlationtoscores.continuous.correlation.pearsonr. See PR #583 by @nicholasloveday.
Documentation
- Added "Dimension Handling" tutorial, which describes reducing and preserving dimensions. See PR #589 by @nicholasloveday.
- Updated "Detailed Installation Guide" with information on installing kernels in a Jupyter environment. See PR #586 by @tennlee and PR #587 by @Steph-Chong.
Internal Changes
0.9.2
What's Changed
- Add Badges to the README for CodeQL, code coverage, and binder link by @tennlee in #555
- Substantially update "Data Sources" page in documentation by @Steph-Chong in #544
- Add a Key Features page to docs by @Steph-Chong in #567
- Addition of consistent scoring rules by @nicholasloveday in #540
- Release 0.9.2 by @tennlee in #570
Full Changelog: 0.9.1...0.9.2