Skip to content

Commit 9874af4

Browse files
Release 3.0
- Change in data source: Datadista to esCovid19data. - Adaptation of the Spain dataitems, they are now calculated from the Autonomous Communities. - Added dataitem "Accumulated lethality". - Added vaccines dataitems: "Dose of vaccine delivered", "Dose of vaccine supplied", "Percentage of doses of vaccine supplied" and "Percentage of population vaccinated. - Implemented attributes of temporal granularity, regional granularity and update frequency. Now, each data source is only refreshed following its update frequency. - Change from ES-regions to ES-communities. - A new Region config file for countries is added.
1 parent 4b5ecd5 commit 9874af4

24 files changed

+6365
-9523
lines changed

lib/README.md

Lines changed: 60 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -43,16 +43,28 @@ A Data Item is a low-grain resource which codifies a specific piece of informati
4343

4444

4545
### Data Type
46-
The COnVIDa library considers two types of Data Items used to interpret and analyze them, namely:
46+
COnVIDa library considers two types of Data Items used to interpret and analyze them, namely:
4747

48-
* **Temporal**: The data items are indexed by days, so they will show the daily values. In particular, _COVID19, Mobility, MoMo_ and _AEMET_ data items are temporal. For instance, if we select the COVID19 cases in Murcia from 21/02/2020 until 14/05/2020, the X axis will show all the days between those two dates, while Y axis will show the daily COVID19 cases in Murcia.
48+
* **Temporal**: The data items are indexed by time units (up to date, only days supported), so they will show in that temporal frequency. In particular, _COVID19, Mobility, MoMo_ and _AEMET_ data items are temporal. For instance, if we select the COVID19 cases in Murcia from 21/02/2020 until 14/05/2020, the X axis will show all the periods between those two dates, while Y axis will show the COVID19 cases in Murcia.
49+
50+
* **Geographical**: The data items are indexed by region units. In particular, current _INE_ data items are geographical. It is worth mentioning that the user of this library could transform temporal data items to a geographical perspective by applying any kind of aggregation scheme. For instance, in COnVIDa service, if we choose the analysis type by regions and select some temporal data items, then COnVIDa service will descriptive statistical functions of those data items within the specified data ranges.
51+
52+
### Temporal Granularity
53+
The current release of COnVIDa library considers the following temporal units:
54+
55+
* **DAILY**: For temporal data sources, the data items should be presented by days. For creating new data sources to be directly integrated in the platform, developers should guarantee that granularity in the time series.
56+
57+
_More granularities can be supported in the future_
4958

5059

51-
* **Geographical**: The data items are indexed by regions and the data is aggregated with absolute values. In particular, current _INE_ data items are geographical. It is worth mentioning that the user of this library could transform temporal data items to a geographical perspective by applying any kind of aggregation scheme. For instance, in COnVIDa service, if we choose the analysis type by regions and select some temporal data items, then COnVIDa service will use the mean of those data items within the specified data ranges.
60+
### Regional Granularity
61+
The current release of COnVIDa library supports the following regional units:
5262

63+
* **COMMUNITY**: The data items can be presented per Spanish communities.
5364

54-
### Regions
55-
Regions are divisions of the territory that allow a more exhaustive and deeper collection and analysis. Currently, they are implemented as the Autonomous Regions in Spain, although the granularity (provinces, minicipalities, etc.) can be easily adapted. In this sense, _COnVIDa_ lib allows filtering the aforementioned data items by regions.
65+
* **PROVINCE**: The data items can be presented per Spanish provinces.
66+
67+
_More granularities can be supported in the future_
5668

5769

5870
## User guidelines
@@ -62,15 +74,36 @@ The [test lib notebook](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/
6274
#### [`Regions class`](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/regions.py)
6375
Implements the required information for Regions management
6476

65-
##### `get_country_codes()`
66-
Returns a list with the supported country codes. Right now, only 'ES' for Spanish regiones is available, although this is easily extensible to other countries.
77+
##### `get_regions(country_code='ES')`
78+
Returns a list with the names of the regions associated with a country code.
6779

80+
Parameters
81+
- country_code: str
82+
country code of the regions to retrieve.
83+
84+
##### `get_regions_by_type(cls, type='c', country_code='ES')`
85+
Returns a list with the names of the regions of a specific type associated with a country code.
86+
87+
Parameters
88+
- type: str
89+
For the country selected, the regional granularity to get. For Spain: 'c' Community, 'p' Province.
90+
- country_code: str
91+
country of the regions
6892

69-
##### `get_regions(country_code='ES')`
70-
Returns a list with the names of the Spanish Autonomous Regions.
93+
94+
##### `get_regions_population(cls, country_code='ES'):`
95+
96+
Returns the number of citizens per region in a specific country
7197

7298
Parameters
73-
- country_code: string indicating the country of the regions. Right now, only 'ES' for Spanish regiones is available.
99+
- country_code: str
100+
Country code of the regions.
101+
102+
##### `get_country_codes()`
103+
Returns a dictionary with the supported countries as keys, and their codes as values.
104+
105+
106+
74107

75108
***
76109

@@ -80,6 +113,10 @@ Provides an interface for the library user to avoid the use of low-level functio
80113
##### `get_data_types()`
81114
Returns the implemented DataTypes in string format.
82115

116+
##### `get_sources_info()`
117+
Prints and returns a dictionary with the metadata about the supported data sources
118+
119+
83120
##### `get_data_items_names(data_type=None, language='ES')`
84121
Returns a dictionary with data sources as keys, and an array of associated data item names as values.
85122

@@ -112,7 +149,7 @@ Provides an interface for the library user to avoid the use of low-level functio
112149

113150
Parameters
114151
- data_items: list of data item names. By default, 'all' are collected.
115-
- regions: list of region names. By default, 'ES' refers to all Spanish Autonomous Regions.
152+
- regions: list of region names. By default, 'ES' refers to all Spanish regions.
116153
- start_date: first day in pandas datetime to be considered in TEMPORAL data items. By default, None is established.
117154
- end_date: last day in pandas datetime to be considered in TEMPORAL data items. By default, None is established.
118155
- language: language of the returned data.
@@ -134,9 +171,12 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
134171

135172
1. First of all, some elements should be defined regarding your new Data Source:
136173
* Name of the Data Source
137-
* Data Format of the resource (`JSON` or `CSV`)
138174
* Data Type of the Data Source (`TEMPORAL` or `GEOGRAPHICAL`)
175+
* Temporal Granularity the Data Source (`DAILY`)
176+
* Regional Granularity the Data Source (`COMMUNITIES or/and PROVINCES`)
139177
* Representation of the regions within the Data Source (_iso\_3166\_2_, _ine code_, ...)
178+
* Data Format of the resource (`JSON` or `CSV`)
179+
* Update Frequency of the data series (in days)
140180
* Information of each Data Item of the Data Source
141181
* Name (literally used by the Data Source)
142182
* Display Name (used to change the third-party nomenclature to a desired custom one)
@@ -145,9 +185,9 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
145185

146186
2. Configure the aforementioned principal elements of your new Data Source:
147187

148-
* The name, data format, data type and region representation should be included in the [datasources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json). With this aim, append a new entry in the JSON object with the data source name as a key, and a dictionary with the corresponding information regarding `DATA FORMAT`, `DATA TYPE` and `REGION REPRESENTATION` as values. If needed, specific config elements of your Data Source can be also included here (_for example, [AEMET data source](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/AEMETDataSource.py) defines its `API KEY` necessary for it to work_).
188+
* The name, data type, temporal and regional granularities, region representation, data format, and update frequency should be included in the [data sources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json). With this aim, append a new entry in the JSON object with the data source name as a key, and a dictionary with the corresponding information regarding `DATA TYPE`, `TEMPORAL GRANULARITY`, `REGIONAL GRANULARITY`, `REGION REPRESENTATION`, `DATA FORMAT`, and `UPDATE FREQUENCY` as values. If needed, specific config elements of your Data Source can be also included here (_for example, [AEMET data source](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/AEMETDataSource.py) defines its `API KEY` necessary for it to work_).
149189

150-
* For each Spanish region, the representation used by your Data Source should be appended accordingly in the [regions configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/config/ES-regions.json) (in case it does not exist yet). Note that the key of the new entries to be added for each region should match with the aforementioned `REGION REPRESENTATION` attribute (defined in [datasources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json)).
190+
* For each region, the representation used by your Data Source should be appended accordingly in the [regions configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/config/ES-regions.json) (in case it does not exist yet). Note that the key of the new entries to be added for each region should match with the aforementioned `REGION REPRESENTATION` attribute (defined in [data sources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json)).
151191

152192
* The information of the Data Items offered by your Data Source should be included in a new configuration file `YourDataSourceName-config.json` in the [specific data source configuration folder](https://github.com/CyberDataLab/COnVIDa-lib/tree/master/lib/datasources/config/data_sources). As in the other configuration files residing in that folder (which may guide you in this procedure), each Data Item should constitute an entry. In particular, each entry is defined by the Data Item name (literally used by the Data Source) as the key and the properties `display_name`, `description` and `data_unit` as the values. The latter should include, in turn, translation in both Spanish and English (or any other language you may define). If needed, specific properties of your Data Items can be also included here (for example, the [Mobility data source](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data_sources/MobilityDataSource-config.json) includes the `data_source` attribute to distinguish the resource where each Data Item comes from).
153193

@@ -167,16 +207,19 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
167207

168208
* Declare to `None` the following class attributes:
169209
```python
170-
DATA_FORMAT = None
171210
DATA_TYPE = None
211+
TEMPORAL_GRANULARITY = None
212+
REGIONAL_GRANULARITY = None
172213
REGION_REPRESENTATION = None
214+
DATA_FORMAT = None
215+
UPDATE_FREQUENCY = None
173216
DATA_ITEMS = None
174217
DATA_ITEMS_INFO = None
175218
```
176219
In the first execution of the class, these class attributes will load the values from the config files.
177220

178221

179-
* Define and fulfill the following functions:
222+
* Define and fulfill the following functions Specifically, the function which processes partial data should apply the necessary transformations to return data compliant with standard temporal and regional granularity:
180223

181224
```python
182225
def __init__(self, data_items=None, regions=None, start_date=None, end_date=None):
@@ -185,7 +228,7 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
185228
186229
Parameters
187230
- data_items: list of data item names. By default, 'all' are collected.
188-
- regions: list of region names. By default, 'ES' refers to all Spanish provinces.
231+
- regions: list of region names. By default, 'ES' refers to Spanish regions.
189232
- start_date: first day in pandas datetime to be considered in TEMPORAL data items. By default, None is established. If the Data Source is a GOGRAPHICAL data type, then it can be supressed.
190233
- end_date: last day in pandas datetime to be considered in TEMPORAL data items. By default, None is established. If the Data Source is a GOGRAPHICAL data type, then it can be supressed.
191234
'''

0 commit comments

Comments
 (0)