Skip to content

Scorecard configuration #553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Scorecard configuration #553

wants to merge 15 commits into from

Conversation

carlhiggs
Copy link
Collaborator

@carlhiggs carlhiggs commented Jul 8, 2025

This pull request addresses #544

It introduces a new r.get_scorecard_statistics() function, which a user that has completed policy and spatial indicator reviews can invoke to get a summary of indicators, and optionally export these to a human readable formatted text file (scorecard_statistics.yml) in the study region output folder.

Usage is demonstrated as follows for a study region that has had spatial analysis successfully completed and policy review configured (but not all optional policy statistics configured):

# python
Python 3.12.11 | packaged by conda-forge | (main, Jun  4 2025, 14:45:31) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ghsci
>>> r = ghsci.example()

Example study region loaded.  Loading the configured example region as a variable 'r' by running 'r = ghsci.example()' is equivalent to running 'r = ghsci.Region('example_ES_Las_Palmas_2023')' in the Python console.  To proceed with analysis using the 'r' region variable, one can enter 'r.analysis()'.  Once analysis has completed, once can then enter 'r.generate()' to generate resources.  For more information, run 'ghsci.help()'.


>>> r.get_scorecard_statistics()
{'City': 'Las Palmas de Gran Canaria', 'Country': 'Spain', 'Global region': 'Europe', 'Gini Index': 'Not configured', 'Gini source': 'Not configured', 'HDI Index': 'Not configured', 'HDI source': 'Not configured', 'Total urban area (km²)': 62.98789962182564, 'Total population': 333051, 'Total population source': 'Schiavina, Marcello; Freire, Sergio; MacManus, Kytt (2022): GHS-POP R2022A - GHS population grid multitemporal (1975-2030). European Commission, Joint Research Centre (JRC) [Dataset] doi: 10.2905/D6D86A90-4351-4508-99C1-CB074B022C4A', 'City-wide density (pop/km²)': 5287.539384542298, 'GDP per capita (INT $)': 'Not configured', 'Population with access to fresh food market or supermarket': 53.83959229813532, 'Population with access to regularly running formal public transport (<20 mins)': 74.4946008942626, 'Population with access to any public open space': 75.92734627136147, 'Population living in neighbourhoods above minimum density threshold for WHO physical activity target': 85.1, 'Population living in neighbourhoods above minimum connectivity threshold for WHO physical activity target': 96.9, 'Population living in neighbourhoods above the median walkability across the 25 cities*': 95.2, 'Metropolitan transport policy with health-focused actions\xa0 (Transport policy with health-focused actions p.6)': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Air pollution policies for transport AND land-use': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Requirements for public transport access to employment and services': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Employment distribution requirements': {'identified': '✘', 'aligns': '-', 'measurable': '-'}, 'Parking restrictions to discourage car use': {'identified': '✔', 'aligns': '✔', 'measurable': '✘'}, 'Minimum requirements for public open space access': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Street connectivity requirements': {'identified': '✘', 'aligns': '-', 'measurable': '-'}, 'Provision of pedestrian infrastructure AND targets for walking participation': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Provision of cycling infrastructure AND targets for cycling participation': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Housing density requirements': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Minimum requirements for public transport access AND targets for public transport use': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Information on government expenditure for different transport modes is available to the public.': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}}
>>> r.get_scorecard_statistics(export=True)
'/home/ghsci/process/data/_study_region_outputs/example_ES_Las_Palmas_2023/scorecard_statistics.yml'

So, the first time the function was run, we just retrieved the statistics in a Python dictionary with values unrounded that can be reviewed by the user on screen or saved for use in their own analyses.

The second time the funciton is run, the following text is exported to scorecard_statistics.yml in the study region folder, with this file path echoed back to the user:

City: Las Palmas de Gran Canaria
Country: Spain
Global region: Europe
Gini Index: Not configured
Gini source: Not configured
HDI Index: Not configured
HDI source: Not configured
Total urban area (km²): 62.99
Total population: 333,051
Total population source: Schiavina, Marcello; Freire, Sergio; MacManus, Kytt (2022): GHS-POP R2022A - GHS population grid multitemporal (1975-2030). European Commission, Joint Research Centre (JRC) [Dataset] doi: 10.2905/D6D86A90-4351-4508-99C1-CB074B022C4A
City-wide density (pop/km²): 5287.54
GDP per capita (INT $): Not configured
Population with access to fresh food market or supermarket: 53.84
Population with access to regularly running formal public transport (<20 mins): 74.49
Population with access to any public open space: 75.93
Population living in neighbourhoods above minimum density threshold for WHO physical activity target: 85.1
Population living in neighbourhoods above minimum connectivity threshold for WHO physical activity target: 96.9
Population living in neighbourhoods above the median walkability across the 25 cities*: 95.2
Metropolitan transport policy with health-focused actions  (Transport policy with health-focused actions p.6): {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Air pollution policies for transport AND land-use: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Requirements for public transport access to employment and services: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Employment distribution requirements: {'identified': '✘', 'aligns': '-', 'measurable': '-'}
Parking restrictions to discourage car use: {'identified': '✔', 'aligns': '✔', 'measurable': '✘'}
Minimum requirements for public open space access: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Street connectivity requirements: {'identified': '✘', 'aligns': '-', 'measurable': '-'}
Provision of pedestrian infrastructure AND targets for walking participation: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Provision of cycling infrastructure AND targets for cycling participation: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Housing density requirements: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Minimum requirements for public transport access AND targets for public transport use: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Information on government expenditure for different transport modes is available to the public.: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}

The optional user statistics can be completed in the study region configuration file (feedback on user directions/formatting here also welcome):

#########
# optional_scorecard_context_statistics:
# ## The scorecard is an optional summary of the policy and spatial indicator results.
# ## It is prepared by a graphic designer using an additional selection of contextual
# ## information and indicator results, which a research can provide to the GOHSC team.
# ## To specify values for these fields, uncomment and complete the required fields for each.
# ## For more information, contact the GOHSC team at [email protected]
# Gini:
# # Country Gini Index as an estimate of income inequality
# # see https://data.worldbank.org/indicator/SI.POV.GINI
# value:
# year:
# source:
# HDI:
# # Country Human Development Index (HDI)
# # https://hdr.undp.org/data-center/human-development-index#/indicies/HDI
# value:
# year:
# source:
# GDP per capita:
# # City Gross Domestic Product (GDP) per capita as an estimate of economic development (international dollar $)
# value:
# year:
# source:
# ## The following data may be optionally uncommented and provided by users using official data.
# ## This may be preferable to using derived estimates.
# City area (km²):
# # Total study region or city area in square kilometres (km²)
# value:
# year:
# source:
# City population:
# # Total study region or city population
# value:
# year:
# source:

A user can 'uncomment' this easily if using an IDE like VSCode or notepad++ (i.e. just remove the first # marks, but there are shortcuts to uncomment selected lines in IDEs), then add entries as required. There is some input validation implemented for this, so if the wrong format is provided feedback on the error should be provided to a user.

Do you think this could meet your needs @eugenrb ? I'll lodge this pull request now so the automated testing can be launched and mark you as a reviewer --- you're welcome to checkout the scorecard_configuration branch and give it a go with your existing regions. Tests, and any feedback welcome.

carlhiggs and others added 9 commits July 4, 2025 10:47
…f .ghsci_version; this should make things more straight forward for docker compose files that are set to read env variables from .env; it means we don't have to do anything fancy to insert the variable there. Hopefully it will make development in a dev container easier in vscode.
…stics to example configuration file and JSON-schema file, with validation ranges for expected inputs. Unfortunately Python JSON Schema is unable to reliable validate floating point precison, so requirement for number of decimal points has not been specified (i.e. the option 'multipleOf' does not work as it would be expected to).
… and scores for Region objects that should make it easier to prepare scorecard info, but also can be used to simplify code elsewhere when retrieving these items
…ing and summarising specific groups of 1 or more policies, as required for scorecards)
…od r.get_scorecard_statistics that returns the required fields; now just to check, and to export in a meaningful format
…ted text with unicode characters and no quotation marks
@carlhiggs carlhiggs requested a review from eugenrb July 8, 2025 00:37
@carlhiggs
Copy link
Collaborator Author

The functional test workflow failed! I have implemented changes to the JSON schema, and a bit more tweaking is required for this to work before merging. However, it probably will work for your test purposes Eugen, if you still want to check it out.

…ile to address failed test in online run test_0_0_valid_yaml (interestingly, this test passed when invoked locally...)
@carlhiggs
Copy link
Collaborator Author

There is more work to do to ensure the configuration of optional parameters works as intended and is robust; please hold off on reviewing until I let you know I've made updates Eugen. Sorry for prematurely looping you in here.

carlhiggs added 4 commits July 8, 2025 13:30
… occurs when a configuration file is loaded (previously, file was only checked with YAML lint, it is now actually evaluated against the schema); this implements type checking specifically to support #544 to ensure valid inputs are provided that allow parsing (eg numbers for population and area that can optionally be evaluated as a fraction for density)
…ld relax the test linter for comments, towards #544
…ld relax the test linter for comments, towards #544
@carlhiggs
Copy link
Collaborator Author

carlhiggs commented Jul 8, 2025

The latest update is now much more robust, as is validation in general for configuration files (this ties in with #506).

Because users can now supply optional parameters, and a couple of these in particular (population and urban area) require numeric formatting to be able to successfully be evaluated as a proportion (population density), I had to implement stricter type checking on these. And because of the work done towards #506, I was able to do this by doing proper schema validation when loading a region. I had to run through and update our schema definition to provide more flexibility due to optional fields, and some flexibility with some input formats for dates, but this works now with our example along with a whole load of other region files I have to hand to test with.

Now, if a user's configuration file is not meeting the required schema (for example, if they enter "380,000" instead of "380000" for population, where we have directed for the whole optional statistics section "For values, please only use numbers and decimal points; do not use commas or other punctuation.", there will be a more or less clear warning about this provided to the user.

For example, with the above example, when attempting to load a region (in this case the example, which I malformatted for test purposes):

>>> r = ghsci.example()

Example study region loaded.  Loading the configured example region as a variable 'r' by running 'r = ghsci.example()' is equivalent to running 'r = ghsci.Region('example_ES_Las_Palmas_2023')' in the Python console.  To proceed with analysis using the 'r' region variable, one can enter 'r.analysis()'.  Once analysis has completed, once can then enter 'r.generate()' to generate resources.  For more information, run 'ghsci.help()'.

❌ Schema validation failed: '380,000' is not of type 'number'
   Failed at path: reporting -> optional_scorecard_context_statistics -> City population -> value
   Found value: '380,000' (type: str)
   Expected type: number
Schema validation failed for example_ES_Las_Palmas_2023.yml. Please fix the configuration errors before proceeding.

However, if the comma is removed, we can do the following (noting the optional statistics values run here for Las Palmas are completely fictious, just for test purpoes -- not true!):

>>> import ghsci
>>> r = ghsci.example()

Example study region loaded.  Loading the configured example region as a variable 'r' by running 'r = ghsci.example()' is equivalent to running 'r = ghsci.Region('example_ES_Las_Palmas_2023')' in the Python console.  To proceed with analysis using the 'r' region variable, one can enter 'r.analysis()'.  Once analysis has completed, once can then enter 'r.generate()' to generate resources.  For more information, run 'ghsci.help()'.

>>> r.get_scorecard_statistics(export = True)
'/home/ghsci/process/data/_study_region_outputs/example_ES_Las_Palmas_2023/example_ES_Las_Palmas_2023_scorecard_statistics.yml'

The output file example_ES_Las_Palmas_2023_scorecard_statistics.yml contains the following text:

City: Las Palmas de Gran Canaria
Country: Spain
Global region: Europe
Gini Index: Not configured
Gini source: Not configured
HDI Index: Not configured
HDI source: Not configured
Total urban area (km²): 62.99
Total population: 333,051
Total population source: Global Human Settlements urban centres: 2015 (EU JRC, 2019; Las Palmas de Gran Canaria only) under CC BY 4.0. Centro Nacional de Información Geográfica under CC-BY-4.0
City-wide density (pop/km²): 5287.54
GDP per capita (INT $): Not configured
Population with access to fresh food market or supermarket: 53.84
Population with access to regularly running formal public transport (<20 mins): 74.49
Population with access to any public open space: 75.93
Population living in neighbourhoods above minimum density threshold for WHO physical activity target: 85.1
Population living in neighbourhoods above minimum connectivity threshold for WHO physical activity target: 96.9
Population living in neighbourhoods above the median walkability across the 25 cities*: 95.2
Metropolitan transport policy with health-focused actions  (Transport policy with health-focused actions p.6): {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Air pollution policies for transport AND land-use: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Requirements for public transport access to employment and services: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Employment distribution requirements: {'identified': '✘', 'aligns': '-', 'measurable': '-'}
Parking restrictions to discourage car use: {'identified': '✔', 'aligns': '✔', 'measurable': '✘'}
Minimum requirements for public open space access: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Street connectivity requirements: {'identified': '✘', 'aligns': '-', 'measurable': '-'}
Provision of pedestrian infrastructure AND targets for walking participation: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Provision of cycling infrastructure AND targets for cycling participation: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Housing density requirements: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Minimum requirements for public transport access AND targets for public transport use: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Information on government expenditure for different transport modes is available to the public.: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}

The density value of 5287.54 is the produce of the unrounded statistics 333051/62.98789962182564 = 5287.539384542298. Worth noting because if you evaluate with the rounded numbers the result would be slightly different (333051/62.99=5287.36307350373). Just pre-empting the question why there is a slight difference when evaluating with the rounded numbers that would be on the scorecard.

The unrounded statistics can be viewed by invoking r.get_scorecard_statistics(), which by default doesn't export the formatted results.

>>> r.get_scorecard_statistics()
{'City': 'Las Palmas de Gran Canaria', 'Country': 'Spain', 'Global region': 'Europe', 'Gini Index': 'Not configured', 'Gini source': 'Not configured', 'HDI Index': 'Not configured', 'HDI source': 'Not configured', 'Total urban area (km²)': 62.98789962182564, 'Total population': 333051, 'Total population source': 'Global Human Settlements urban centres: 2015 (EU JRC, 2019; Las Palmas de Gran Canaria only) under CC BY 4.0. Centro Nacional de Información Geográfica under CC-BY-4.0', 'City-wide density (pop/km²)': 5287.539384542298, 'GDP per capita (INT $)': 'Not configured', 'Population with access to fresh food market or supermarket': 53.83959229813532, 'Population with access to regularly running formal public transport (<20 mins)': 74.4946008942626, 'Population with access to any public open space': 75.92734627136147, 'Population living in neighbourhoods above minimum density threshold for WHO physical activity target': 85.1, 'Population living in neighbourhoods above minimum connectivity threshold for WHO physical activity target': 96.9, 'Population living in neighbourhoods above the median walkability across the 25 cities*': 95.2, 'Metropolitan transport policy with health-focused actions\xa0 (Transport policy with health-focused actions p.6)': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Air pollution policies for transport AND land-use': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Requirements for public transport access to employment and services': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Employment distribution requirements': {'identified': '✘', 'aligns': '-', 'measurable': '-'}, 'Parking restrictions to discourage car use': {'identified': '✔', 'aligns': '✔', 'measurable': '✘'}, 'Minimum requirements for public open space access': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Street connectivity requirements': {'identified': '✘', 'aligns': '-', 'measurable': '-'}, 'Provision of pedestrian infrastructure AND targets for walking participation': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Provision of cycling infrastructure AND targets for cycling participation': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Housing density requirements': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Minimum requirements for public transport access AND targets for public transport use': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}, 'Information on government expenditure for different transport modes is available to the public.': {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}}

(incidentally, the most recent test fail was due to a connection interuption when retrieving map tiles; the error is obscure though and may be confusing for our users; I lodged an issue at #554 . When the test run was retriggered due to my renaming of the output file to include region code name, it passed)

@carlhiggs
Copy link
Collaborator Author

@eugenrb this is good for your review and feedback now if you like!

@carlhiggs
Copy link
Collaborator Author

I just updated the output to use the labels for policy indicators present in the scorecard --- I think this should be good now:

City: Las Palmas de Gran Canaria
Country: Spain
Global region: Europe
Gini Index: Not configured
Gini source: Not configured
HDI Index: Not configured
HDI source: Not configured
Total urban area (km²): 62.99
Total population: 333,051
Total population source: Global Human Settlements urban centres: 2015 (EU JRC, 2019; Las Palmas de Gran Canaria only) under CC BY 4.0. Centro Nacional de Información Geográfica under CC-BY-4.0
City-wide density (pop/km²): 5287.54
GDP per capita (INT $): Not configured
Population with access to fresh food market or supermarket: 53.84
Population with access to regularly running formal public transport (<20 mins): 74.49
Population with access to any public open space: 75.93
Population living in neighbourhoods above minimum density threshold for WHO physical activity target: 85.1
Population living in neighbourhoods above minimum connectivity threshold for WHO physical activity target: 96.9
Population living in neighbourhoods above the median walkability across the 25 cities*: 95.2
Metropolitan transport policy with health-focused actions: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Air pollution policies for transport and land-use planning: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Requirements for public transport access to employment and services: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Employment distribution requirements: {'identified': '✘', 'aligns': '-', 'measurable': '-'}
Parking restrictions to discourage car use: {'identified': '✔', 'aligns': '✔', 'measurable': '✘'}
Minimum public open space access requirements: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Street connectivity requirements: {'identified': '✘', 'aligns': '-', 'measurable': '-'}
Provision of pedestrian infrastructure and targets for walking participation: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Provision of cycling infrastructure and targets for cycling participation: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Housing density requirements: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Minimum requirements for public transport access and targets for public transport use: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}
Publicly available information on government expenditure for different transport modes: {'identified': '✔', 'aligns': '✔', 'measurable': '✔'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant