Skip to content

Commit d041f26

Browse files
committed
added articles
1 parent ecc0de1 commit d041f26

File tree

5 files changed

+114
-23
lines changed

5 files changed

+114
-23
lines changed

_articles/covid-eu-cases.md

Lines changed: 110 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,120 @@ comments: true
1010
types:
1111
- 'dataset'
1212
category:
13-
- epidemic
13+
- github
1414
tag:
15-
- 'Epidemic'
16-
summary: Automated data collection using GitHub Actions.
15+
- 'GitHub Actions'
16+
summary: covid19-eu-zh/covid19-eu-data is an automated COVID-19 confirmed cases data collection experiment using GitHub Actions.
1717
dataset:
1818
- id: covid19_eu_data
1919
references:
2020
- name: "Data Mining: Concepts and Techniques"
2121
link: https://www.amazon.com/Data-Mining-Concepts-Techniques-Management/dp/0123814790
2222
---
23+
24+
The [covid19-eu-zh/covid19-eu-data](https://github.com/covid19-eu-zh/covid19-eu-data) repository is an experiment of data scraping and aggregation using GitHub Actions.
25+
26+
## Structure
27+
28+
The structure of the project is as follows.
29+
30+
```
31+
.
32+
├── README.md
33+
├── dataset #where the data files lives
34+
├── documents #where the raw data and files lives
35+
├── now.json #zeit now setup for a FAAS service
36+
└── scripts #scripts to download and aggregate data
37+
```
38+
39+
### Scripts
40+
41+
We have a python script for each country for more flexible schedules of each country. We are using classes from `utils.py` so that the scripts all have similar structure.
42+
43+
```
44+
scripts
45+
├── download_at.py
46+
├── download_de.py
47+
├── download_es.py
48+
├── download_fr.py
49+
├── download_nl.py
50+
├── download_uk.py
51+
├── requirements.txt
52+
└── utils.py
53+
```
54+
55+
### Dataset
56+
57+
The dataset folder contains the full dataset of each country and the daily pdates of each country.
58+
59+
```
60+
dataset
61+
├── covid-19-at.csv
62+
├── covid-19-de.csv
63+
├── covid-19-nl.csv
64+
├── covid-19-uk.csv
65+
└── daily
66+
├── at
67+
├── de
68+
├── nl
69+
└── uk
70+
```
71+
72+
## GitHub Actions
73+
74+
We manage the pipelines using GitHub Actions. The full set of workflows is found in [the original repository](https://github.com/covid19-eu-zh/covid19-eu-data/actions).
75+
76+
We use Germany as an example. In the workflow for Germany, we have two trigger, pushing to master branch and schedule. The job steps are
77+
78+
1. Checkout the repository;
79+
2. Setup python and install python requirements;
80+
3. Run the python script to download and aggregate data;
81+
4. Push data to repository.
82+
83+
{% highlight yaml %}
84+
name: CI Download DE SARS-COV-2 Cases from RKI
85+
86+
on:
87+
push:
88+
branches:
89+
- master
90+
schedule:
91+
- cron: '0 7/1 * * *'
92+
93+
jobs:
94+
build:
95+
96+
runs-on: ubuntu-latest
97+
98+
steps:
99+
- name: Checkout current repo
100+
uses: actions/checkout@v2
101+
- name: Get current directory and files
102+
run: |
103+
pwd
104+
ls
105+
- uses: actions/setup-python@v1
106+
with:
107+
python-version: '3.7' # Version range or exact version of a Python version to use, using SemVer's version range syntax
108+
architecture: 'x64' # optional x64 or x86. Defaults to x64 if not specified
109+
- name: Install Python Requirements
110+
run: |
111+
python --version
112+
pip install -r scripts/requirements.txt
113+
- name: Download Records
114+
run: |
115+
python scripts/download_de.py
116+
ls dataset/daily/de
117+
git config --local user.email "[email protected]"
118+
git config --local user.name "GitHub Action"
119+
git pull
120+
git status
121+
git add .
122+
git commit -m "Update DE Dataset" || echo "Nothing to update"
123+
git status
124+
- name: Push changes
125+
uses: ad-m/github-push-action@master
126+
with:
127+
repository: covid19-eu-zh/covid19-eu-data
128+
github_token: {% raw %}${{ secrets.GITHUB_TOKEN }}{% endraw %}
129+
{% endhighlight %}

_layouts/articles.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ <h1 class="post-title has-text-centered is-size-1" itemprop="name headline">{{ p
3636
<div class="is-divider" data-content="AUTHORS"></div>
3737
{% for author in page.authors %}
3838
{% assign author_db = site.data.authors[author.id] %}
39-
<div class="box">
39+
<div class="box is-size-7">
4040
<article class="media">
4141
<div class="media-content">
4242
<div class="content">

about.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ DataHerb do **not** take your data. The datasets are fully managed by the owners
2323
DataHerb is an initiative for transparent data management in open data. To achieve transparency, we use a metadata-driven design. Every step is transparent and can be investigated.
2424

2525
- Contribute datasets: list your datasets on DataHerb in just two steps. Datasets that can be used to enhance machine learning datasets are preferred. [Tutorial]({{ site.baseurl }}/add)
26+
- Write a short story to tell us about the story behind your dataset and submit to [DataHerb Articles]({{ site.baseurl }}/articles).
2627
- Use DataHerb in your projects.
2728
- Spread the words.
2829
- Help us build a better DataHerb. [GitHub Organization](https://github.com/dataherb); [Leave a comment](#comments)

community/dataherb-python.md

Lines changed: 0 additions & 19 deletions
This file was deleted.

community/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,5 @@ comments: true
66
---
77

88
DataHerb is also a community for data sharing.
9+
10+
Join our telegram channel: [DataHerb Telegram Channel](https://t.me/dataherb).

0 commit comments

Comments
 (0)