You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The [covid19-eu-zh/covid19-eu-data](https://github.com/covid19-eu-zh/covid19-eu-data) repository is an experiment of data scraping and aggregation using GitHub Actions.
25
+
26
+
## Structure
27
+
28
+
The structure of the project is as follows.
29
+
30
+
```
31
+
.
32
+
├── README.md
33
+
├── dataset #where the data files lives
34
+
├── documents #where the raw data and files lives
35
+
├── now.json #zeit now setup for a FAAS service
36
+
└── scripts #scripts to download and aggregate data
37
+
```
38
+
39
+
### Scripts
40
+
41
+
We have a python script for each country for more flexible schedules of each country. We are using classes from `utils.py` so that the scripts all have similar structure.
42
+
43
+
```
44
+
scripts
45
+
├── download_at.py
46
+
├── download_de.py
47
+
├── download_es.py
48
+
├── download_fr.py
49
+
├── download_nl.py
50
+
├── download_uk.py
51
+
├── requirements.txt
52
+
└── utils.py
53
+
```
54
+
55
+
### Dataset
56
+
57
+
The dataset folder contains the full dataset of each country and the daily pdates of each country.
58
+
59
+
```
60
+
dataset
61
+
├── covid-19-at.csv
62
+
├── covid-19-de.csv
63
+
├── covid-19-nl.csv
64
+
├── covid-19-uk.csv
65
+
└── daily
66
+
├── at
67
+
├── de
68
+
├── nl
69
+
└── uk
70
+
```
71
+
72
+
## GitHub Actions
73
+
74
+
We manage the pipelines using GitHub Actions. The full set of workflows is found in [the original repository](https://github.com/covid19-eu-zh/covid19-eu-data/actions).
75
+
76
+
We use Germany as an example. In the workflow for Germany, we have two trigger, pushing to master branch and schedule. The job steps are
77
+
78
+
1. Checkout the repository;
79
+
2. Setup python and install python requirements;
80
+
3. Run the python script to download and aggregate data;
81
+
4. Push data to repository.
82
+
83
+
{% highlight yaml %}
84
+
name: CI Download DE SARS-COV-2 Cases from RKI
85
+
86
+
on:
87
+
push:
88
+
branches:
89
+
- master
90
+
schedule:
91
+
- cron: '0 7/1 * * *'
92
+
93
+
jobs:
94
+
build:
95
+
96
+
runs-on: ubuntu-latest
97
+
98
+
steps:
99
+
- name: Checkout current repo
100
+
uses: actions/checkout@v2
101
+
- name: Get current directory and files
102
+
run: |
103
+
pwd
104
+
ls
105
+
- uses: actions/setup-python@v1
106
+
with:
107
+
python-version: '3.7' # Version range or exact version of a Python version to use, using SemVer's version range syntax
108
+
architecture: 'x64' # optional x64 or x86. Defaults to x64 if not specified
Copy file name to clipboardExpand all lines: about.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,7 @@ DataHerb do **not** take your data. The datasets are fully managed by the owners
23
23
DataHerb is an initiative for transparent data management in open data. To achieve transparency, we use a metadata-driven design. Every step is transparent and can be investigated.
24
24
25
25
- Contribute datasets: list your datasets on DataHerb in just two steps. Datasets that can be used to enhance machine learning datasets are preferred. [Tutorial]({{ site.baseurl }}/add)
26
+
- Write a short story to tell us about the story behind your dataset and submit to [DataHerb Articles]({{ site.baseurl }}/articles).
26
27
- Use DataHerb in your projects.
27
28
- Spread the words.
28
29
- Help us build a better DataHerb. [GitHub Organization](https://github.com/dataherb); [Leave a comment](#comments)
0 commit comments