Skip to content

Commit 4b5ecd5

Browse files
Updated lib
1 parent dd0a553 commit 4b5ecd5

24 files changed

+9225
-6067
lines changed

lib/README.md

Lines changed: 17 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -43,28 +43,16 @@ A Data Item is a low-grain resource which codifies a specific piece of informati
4343

4444

4545
### Data Type
46-
COnVIDa library considers two types of Data Items used to interpret and analyze them, namely:
46+
The COnVIDa library considers two types of Data Items used to interpret and analyze them, namely:
4747

48-
* **Temporal**: The data items are indexed by time units (up to date, only days supported), so they will show in that temporal frequency. In particular, _COVID19, Mobility, MoMo_ and _AEMET_ data items are temporal. For instance, if we select the COVID19 cases in Murcia from 21/02/2020 until 14/05/2020, the X axis will show all the periods between those two dates, while Y axis will show the COVID19 cases in Murcia.
49-
50-
* **Geographical**: The data items are indexed by region units. In particular, current _INE_ data items are geographical. It is worth mentioning that the user of this library could transform temporal data items to a geographical perspective by applying any kind of aggregation scheme. For instance, in COnVIDa service, if we choose the analysis type by regions and select some temporal data items, then COnVIDa service will descriptive statistical functions of those data items within the specified data ranges.
51-
52-
### Temporal Granularity
53-
The current release of COnVIDa library considers the following temporal units:
54-
55-
* **DAILY**: For temporal data sources, the data items should be presented by days. For creating new data sources to be directly integrated in the platform, developers should guarantee that granularity in the time series.
56-
57-
_More granularities can be supported in the future_
48+
* **Temporal**: The data items are indexed by days, so they will show the daily values. In particular, _COVID19, Mobility, MoMo_ and _AEMET_ data items are temporal. For instance, if we select the COVID19 cases in Murcia from 21/02/2020 until 14/05/2020, the X axis will show all the days between those two dates, while Y axis will show the daily COVID19 cases in Murcia.
5849

5950

60-
### Regional Granularity
61-
The current release of COnVIDa library supports the following regional units:
51+
* **Geographical**: The data items are indexed by regions and the data is aggregated with absolute values. In particular, current _INE_ data items are geographical. It is worth mentioning that the user of this library could transform temporal data items to a geographical perspective by applying any kind of aggregation scheme. For instance, in COnVIDa service, if we choose the analysis type by regions and select some temporal data items, then COnVIDa service will use the mean of those data items within the specified data ranges.
6252

63-
* **COMMUNITY**: The data items can be presented per Spanish communities.
6453

65-
* **PROVINCE**: The data items can be presented per Spanish provinces.
66-
67-
_More granularities can be supported in the future_
54+
### Regions
55+
Regions are divisions of the territory that allow a more exhaustive and deeper collection and analysis. Currently, they are implemented as the Autonomous Regions in Spain, although the granularity (provinces, minicipalities, etc.) can be easily adapted. In this sense, _COnVIDa_ lib allows filtering the aforementioned data items by regions.
6856

6957

7058
## User guidelines
@@ -74,36 +62,15 @@ The [test lib notebook](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/
7462
#### [`Regions class`](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/regions.py)
7563
Implements the required information for Regions management
7664

77-
##### `get_regions(country_code='ES')`
78-
Returns a list with the names of the regions associated with a country code.
79-
80-
Parameters
81-
- country_code: str
82-
country code of the regions to retrieve.
83-
84-
##### `get_regions_by_type(cls, type='c', country_code='ES')`
85-
Returns a list with the names of the regions of a specific type associated with a country code.
86-
87-
Parameters
88-
- type: str
89-
For the country selected, the regional granularity to get. For Spain: 'c' Community, 'p' Province.
90-
- country_code: str
91-
country of the regions
92-
93-
94-
##### `get_regions_population(cls, country_code='ES'):`
95-
96-
Returns the number of citizens per region in a specific country
97-
98-
Parameters
99-
- country_code: str
100-
Country code of the regions.
101-
10265
##### `get_country_codes()`
103-
Returns a dictionary with the supported countries as keys, and their codes as values.
66+
Returns a list with the supported country codes. Right now, only 'ES' for Spanish regiones is available, although this is easily extensible to other countries.
10467

10568

69+
##### `get_regions(country_code='ES')`
70+
Returns a list with the names of the Spanish Autonomous Regions.
10671

72+
Parameters
73+
- country_code: string indicating the country of the regions. Right now, only 'ES' for Spanish regiones is available.
10774

10875
***
10976

@@ -113,10 +80,6 @@ Provides an interface for the library user to avoid the use of low-level functio
11380
##### `get_data_types()`
11481
Returns the implemented DataTypes in string format.
11582

116-
##### `get_sources_info()`
117-
Prints and returns a dictionary with the metadata about the supported data sources
118-
119-
12083
##### `get_data_items_names(data_type=None, language='ES')`
12184
Returns a dictionary with data sources as keys, and an array of associated data item names as values.
12285

@@ -149,7 +112,7 @@ Provides an interface for the library user to avoid the use of low-level functio
149112

150113
Parameters
151114
- data_items: list of data item names. By default, 'all' are collected.
152-
- regions: list of region names. By default, 'ES' refers to all Spanish regions.
115+
- regions: list of region names. By default, 'ES' refers to all Spanish Autonomous Regions.
153116
- start_date: first day in pandas datetime to be considered in TEMPORAL data items. By default, None is established.
154117
- end_date: last day in pandas datetime to be considered in TEMPORAL data items. By default, None is established.
155118
- language: language of the returned data.
@@ -171,12 +134,9 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
171134

172135
1. First of all, some elements should be defined regarding your new Data Source:
173136
* Name of the Data Source
137+
* Data Format of the resource (`JSON` or `CSV`)
174138
* Data Type of the Data Source (`TEMPORAL` or `GEOGRAPHICAL`)
175-
* Temporal Granularity the Data Source (`DAILY`)
176-
* Regional Granularity the Data Source (`COMMUNITIES or/and PROVINCES`)
177139
* Representation of the regions within the Data Source (_iso\_3166\_2_, _ine code_, ...)
178-
* Data Format of the resource (`JSON` or `CSV`)
179-
* Update Frequency of the data series (in days)
180140
* Information of each Data Item of the Data Source
181141
* Name (literally used by the Data Source)
182142
* Display Name (used to change the third-party nomenclature to a desired custom one)
@@ -185,9 +145,9 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
185145

186146
2. Configure the aforementioned principal elements of your new Data Source:
187147

188-
* The name, data type, temporal and regional granularities, region representation, data format, and update frequency should be included in the [data sources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json). With this aim, append a new entry in the JSON object with the data source name as a key, and a dictionary with the corresponding information regarding `DATA TYPE`, `TEMPORAL GRANULARITY`, `REGIONAL GRANULARITY`, `REGION REPRESENTATION`, `DATA FORMAT`, and `UPDATE FREQUENCY` as values. If needed, specific config elements of your Data Source can be also included here (_for example, [AEMET data source](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/AEMETDataSource.py) defines its `API KEY` necessary for it to work_).
148+
* The name, data format, data type and region representation should be included in the [datasources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json). With this aim, append a new entry in the JSON object with the data source name as a key, and a dictionary with the corresponding information regarding `DATA FORMAT`, `DATA TYPE` and `REGION REPRESENTATION` as values. If needed, specific config elements of your Data Source can be also included here (_for example, [AEMET data source](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/AEMETDataSource.py) defines its `API KEY` necessary for it to work_).
189149

190-
* For each region, the representation used by your Data Source should be appended accordingly in the [regions configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/config/ES-regions.json) (in case it does not exist yet). Note that the key of the new entries to be added for each region should match with the aforementioned `REGION REPRESENTATION` attribute (defined in [data sources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json)).
150+
* For each Spanish region, the representation used by your Data Source should be appended accordingly in the [regions configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/config/ES-regions.json) (in case it does not exist yet). Note that the key of the new entries to be added for each region should match with the aforementioned `REGION REPRESENTATION` attribute (defined in [datasources configuration file](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data-sources-config.json)).
191151

192152
* The information of the Data Items offered by your Data Source should be included in a new configuration file `YourDataSourceName-config.json` in the [specific data source configuration folder](https://github.com/CyberDataLab/COnVIDa-lib/tree/master/lib/datasources/config/data_sources). As in the other configuration files residing in that folder (which may guide you in this procedure), each Data Item should constitute an entry. In particular, each entry is defined by the Data Item name (literally used by the Data Source) as the key and the properties `display_name`, `description` and `data_unit` as the values. The latter should include, in turn, translation in both Spanish and English (or any other language you may define). If needed, specific properties of your Data Items can be also included here (for example, the [Mobility data source](https://github.com/CyberDataLab/COnVIDa-lib/blob/master/lib/datasources/config/data_sources/MobilityDataSource-config.json) includes the `data_source` attribute to distinguish the resource where each Data Item comes from).
193153

@@ -207,19 +167,16 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
207167

208168
* Declare to `None` the following class attributes:
209169
```python
170+
DATA_FORMAT = None
210171
DATA_TYPE = None
211-
TEMPORAL_GRANULARITY = None
212-
REGIONAL_GRANULARITY = None
213172
REGION_REPRESENTATION = None
214-
DATA_FORMAT = None
215-
UPDATE_FREQUENCY = None
216173
DATA_ITEMS = None
217174
DATA_ITEMS_INFO = None
218175
```
219176
In the first execution of the class, these class attributes will load the values from the config files.
220177

221178

222-
* Define and fulfill the following functions Specifically, the function which processes partial data should apply the necessary transformations to return data compliant with standard temporal and regional granularity:
179+
* Define and fulfill the following functions:
223180

224181
```python
225182
def __init__(self, data_items=None, regions=None, start_date=None, end_date=None):
@@ -228,7 +185,7 @@ _COnVIDa-lib_ constitutes an object-oriented package ready to be extended. Consi
228185
229186
Parameters
230187
- data_items: list of data item names. By default, 'all' are collected.
231-
- regions: list of region names. By default, 'ES' refers to Spanish regions.
188+
- regions: list of region names. By default, 'ES' refers to all Spanish provinces.
232189
- start_date: first day in pandas datetime to be considered in TEMPORAL data items. By default, None is established. If the Data Source is a GOGRAPHICAL data type, then it can be supressed.
233190
- end_date: last day in pandas datetime to be considered in TEMPORAL data items. By default, None is established. If the Data Source is a GOGRAPHICAL data type, then it can be supressed.
234191
'''

0 commit comments

Comments
 (0)