Skip to content

Commit 83d2632

Browse files
committed
update contribute and about
1 parent 29055bb commit 83d2632

15 files changed

+94
-180
lines changed

add/create-repo.md

+60-58
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,47 @@ exclude: true
66
comments: true
77
---
88

9+
> * Overview: [List Your Dataset on DataHerb]({{site.base_url}}/add)
10+
> * Next step: [Add Your GitHub Repository Name to DataHerb]({{site.base_url}}/add/link-repo-with-dataherb)
911
10-
### Create your GitHub repository to host your data
12+
## Create your GitHub repository to host your data
1113

1214
> If you have generic questions about GitHub, please [leave a comment](#comments) so we could improve this tutorial.
1315
16+
> If you prefer to learn from examples, simple copy everything from this repo [InterImm/dataset-planets-in-solar-system](https://github.com/InterImm/dataset-planets-in-solar-system) and adapt.
17+
18+
### If you prefer to use the web interface of GitHub
19+
1420
1. Go to [github.com and click on the + on the top right](https://github.com/new)
1521
2. Create a repository for your data. [GitHub Help](https://help.github.com/en/github/getting-started-with-github/create-a-repo)
16-
3. Create a `.dataherb` folder in the root of your repository.
17-
4. Place your data files in a folder such as `dataset`.
22+
23+
<figure>
24+
<div>
25+
<img src="{{site.base_url}}/assets/videos/dataherb-demo-ufo-create-new-repo.gif" type="video/gif" />
26+
</div>
27+
</figure>
28+
29+
3. Create a folder to hold your data file, in this example, we will create a fold called `dataset`. Click on the `Create new file` button, and type in `dataset/.githold`. This will creae a folder called `dataset` and place a file called `.githold` inside it.
30+
<figure>
31+
<div>
32+
<video style="display:block; width:100%; height:auto;" autoplay controls loop="loop">
33+
<source src="{{site.base_url}}/assets/videos/dataherb-demo-ufo-upload-datafile-1.mp4" type="video/mp4" />
34+
</video>
35+
</div>
36+
</figure>
37+
38+
Upload your data file into this folder by clicking on button `Upload files`.
39+
40+
4. Create a `.dataherb` folder in the root of your repository. Now the folder structure should be
41+
42+
```
43+
.
44+
├── README.md
45+
├── .dataherb
46+
├── dataset
47+
   └── your_data_file
48+
```
49+
1850
5. Create a file `metadata.yml` in the `.dataherb` with the following content:
1951

2052
```
@@ -24,6 +56,17 @@ comments: true
2456
- name: [Name of the the first contributor]
2557
data:
2658
- name: [name of your data file, optional]
59+
description: [description of your data file, optional]
60+
path: [path_to_your_data_file.csv]
61+
format: csv
62+
size: [size of your data file]
63+
fields:
64+
- name: [name of the first colomn]
65+
description: [description of the first column]
66+
- name: [name of the second colomn]
67+
description: [description of the second column]
68+
- name: [name of your second data file, optional]
69+
description: [description of your second data file, optional]
2770
path: [path_to_your_data_file.csv]
2871
format: csv
2972
size: [size of your data file]
@@ -32,67 +75,26 @@ comments: true
3275
description: [description of the first column]
3376
- name: [name of the second colomn]
3477
description: [description of the second column]
78+
license:
79+
- name: [Name of the license of the dataset]
80+
link: [Link to the license page]
3581
references:
3682
- name: [Name of the first reference]
3783
link: [https://link_to_your_first_reference]
3884
```
3985

40-
For example, the folder structure in this demo project `datumorphism/geonames-timezones` is
86+
> As an example, one could use similar contents as in this demo project [InterImm/dataset-planets-in-solar-system](https://github.com/InterImm/dataset-planets-in-solar-system/blob/master/.dataherb/metadata.yml).
4187
42-
```
43-
.
44-
├── README.md
45-
├── dataset
46-
│   ├── geonames_timezone.csv
47-
│   └── geonames_timezone.json
48-
└── .dataherb
49-
└── metadata.yml
50-
```
5188

52-
The `.dataherb/metadata.yml` file has the following content.
89+
### If you prefer to use the command line
5390

54-
```
55-
name: Geoname Timezones
56-
description: IANA Timezone IDs in different countries from Geonames
57-
contributors:
58-
- name: Datumorphism
59-
github: datumorphism
60-
data:
61-
- path: dataset/geonames_timezone.csv
62-
name: timezones in csv format
63-
format: csv
64-
size: 14K
65-
updated_at: "2020-02-12"
66-
fields:
67-
- name: country_code
68-
description: Alpha 2 country code
69-
- name: timezone_id
70-
description: IANA timezone id, www.iana.org
71-
- name: gmt_offset
72-
description: GMT offset in January 1st
73-
- name: dst_offset
74-
description: Day light saving offset in July 1st
75-
- name: raw_offset
76-
description: Raw offset, independant of DST
77-
- path: dataset/geonames_timezone.json
78-
format: json
79-
size: 58K
80-
updated_at: "2020-02-12"
81-
fields:
82-
- name: country_code
83-
description: Alpha 2 country code
84-
- name: timezone_id
85-
description: IANA timezone id, www.iana.org
86-
- name: gmt_offset
87-
description: GMT offset in January 1st
88-
- name: dst_offset
89-
description: Day light saving offset in July 1st
90-
- name: raw_offset
91-
description: Raw offset, independant of DST
92-
references:
93-
- name: Geonames Download Server
94-
link: https://download.geonames.org/export/dump/
95-
```
96-
6. Add, Commit, and Push the contents to GitHub.
91+
If you use command line for git, the precess is more or less the same. However, there is at least one advantage of using the command line. We have created a command line tool to help you generate the metadata.
92+
93+
1. Install the [dataherb python package](https://pypi.org/project/dataherb/): `pip install dataherb`.
94+
2. Place the data files in a folder like `datasets`.
95+
3. In the root folder of your repo, use the command `dataherb create` and follow the guidlines.
96+
97+
98+
## Next Step
9799

98-
> After creating the repository, you could add your dataset to DataHerb index very easily. Please read [this tutorial: Add Your GitHub Repository Name to DataHerb]({{site.base_url}}/add/add-repo-to-dataherb).
100+
> After creating the repository, you could link your dataset with DataHerb index very easily. Please read [the Next Step: Add Your GitHub Repository Name to DataHerb]({{site.base_url}}/add/link-repo-with-dataherb).

add/index.md

+17-116
Original file line numberDiff line numberDiff line change
@@ -6,133 +6,34 @@ exclude: true
66
comments: true
77
---
88

9-
Adding your dataset is free and easy. You manage your datasets, DataHerb makes it easy to be accessed.
9+
Adding your dataset is free and easy. Simply link your datasets with DataHerb and your dataset will be indexed.
1010

1111
> Datasets that can be used to enhance machine learning datasets are the priorities at the moment. These datasets can be very helpful to the open data community as well as all data scientists/engineers.
1212
1313
## List Your Dataset on DataHerb
1414

1515
It only takes **two steps**:
1616

17-
1. [Create your GitHub repository to host your data.](#create-your-github-repository-to-host-your-data)
18-
2. [Add your GitHub repository name to DataHerb.](#add-your-github-repository-name-to-dataherb)
17+
1. [Create your GitHub repository to host your data.]({{site.base_url}}/add/create-repo)
18+
2. [Add your GitHub repository name to DataHerb.]({{site.base_url}}/add/link-repo-with-dataherb)
1919

2020
Everything else will be done automatically by GitHub Actions.
2121

22-
### Create your GitHub repository to host your data
22+
## What will happen
2323

24-
> If you have generic questions about GitHub, please [leave a comment](#comments) so we could improve this tutorial.
24+
After listing your dataset on DataHerb:
2525

26-
1. Go to [github.com and click on the + on the top right](https://github.com/new)
27-
2. Create a repository for your data. [GitHub Help](https://help.github.com/en/github/getting-started-with-github/create-a-repo)
28-
3. Create a `.dataherb` folder in the root of your repository.
29-
4. Place your data files in a folder such as `dataset`.
30-
5. Create a file `metadata.yml` in the `.dataherb` with the following content:
26+
1. A page will be generated;
27+
<figure>
28+
<div>
29+
<img src="{{site.base_url}}/assets/videos/dataherb-ufo-page.gif" type="video/gif" />
30+
</div>
31+
</figure>
3132

32-
```
33-
name: [Name of your dataset]
34-
description: [Describe your dataset here]
35-
contributors:
36-
- name: [Name of the the first contributor]
37-
data:
38-
- name: [name of your data file, optional]
39-
path: [path_to_your_data_file.csv]
40-
format: csv
41-
size: [size of your data file]
42-
fields:
43-
- name: [name of the first colomn]
44-
description: [description of the first column]
45-
- name: [name of the second colomn]
46-
description: [description of the second column]
47-
references:
48-
- name: [Name of the first reference]
49-
link: [https://link_to_your_first_reference]
50-
```
33+
2. One could use the dataset easily. For example, one could copy & paste the python code to load the data in python. The following is an example to use the dataset in google spreadsheet.
5134

52-
For example, the folder structure in this demo project `datumorphism/geonames-timezones` is
53-
54-
```
55-
.
56-
├── README.md
57-
├── dataset
58-
│   ├── geonames_timezone.csv
59-
│   └── geonames_timezone.json
60-
└── .dataherb
61-
└── metadata.yml
62-
```
63-
64-
The `.dataherb/metadata.yml` file has the following content.
65-
66-
```
67-
name: Geoname Timezones
68-
description: IANA Timezone IDs in different countries from Geonames
69-
contributors:
70-
- name: Datumorphism
71-
github: datumorphism
72-
data:
73-
- path: dataset/geonames_timezone.csv
74-
name: timezones in csv format
75-
format: csv
76-
size: 14K
77-
updated_at: "2020-02-12"
78-
fields:
79-
- name: country_code
80-
description: Alpha 2 country code
81-
- name: timezone_id
82-
description: IANA timezone id, www.iana.org
83-
- name: gmt_offset
84-
description: GMT offset in January 1st
85-
- name: dst_offset
86-
description: Day light saving offset in July 1st
87-
- name: raw_offset
88-
description: Raw offset, independant of DST
89-
- path: dataset/geonames_timezone.json
90-
format: json
91-
size: 58K
92-
updated_at: "2020-02-12"
93-
fields:
94-
- name: country_code
95-
description: Alpha 2 country code
96-
- name: timezone_id
97-
description: IANA timezone id, www.iana.org
98-
- name: gmt_offset
99-
description: GMT offset in January 1st
100-
- name: dst_offset
101-
description: Day light saving offset in July 1st
102-
- name: raw_offset
103-
description: Raw offset, independant of DST
104-
references:
105-
- name: Geonames Download Server
106-
link: https://download.geonames.org/export/dump/
107-
```
108-
6. Add, Commit, and Push the contents to GitHub.
109-
110-
111-
### Add your GitHub repository name to DataHerb.
112-
113-
114-
1. Go to [DataHerb/dataherb-flora](https://github.com/DataHerb/dataherb-flora).
115-
2. Create a new file inside the folder `flora`: [Use this link and click on the *Create new file* button](https://github.com/DataHerb/dataherb-flora/tree/master/flora).
116-
117-
The file name should be descriptive. Do not use spaces in the file name. The file should contain
118-
119-
```
120-
- name: Name of the dataset
121-
repository: repository_owner/repository_name
122-
tags:
123-
- A tag your data
124-
- Another tag for your data
125-
```
126-
127-
For example, one could have a file with the name `ecb_currency_exchange.yml` and the following content.
128-
129-
```
130-
- name: geonames_timezone
131-
repository: datumorphism/geonames-timezones
132-
tags:
133-
- Geo
134-
```
135-
136-
3. Create a new Pull Request and tell us what you have added.
137-
138-
> We do not take your data. The data page will link to your GitHub repository and the corresponding data files.
35+
<figure>
36+
<div>
37+
<img src="{{site.base_url}}/assets/videos/dataherb-european-countries-spreadsheet.gif" type="video/gif" />
38+
</div>
39+
</figure>

add/add-repo-to-dataherb.md renamed to add/link-repo-with-dataherb.md

+17-6
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
11
---
22
layout: page
33
title: Add Your GitHub Repository Name to DataHerb
4-
permalink: /add/add-repo-to-dataherb
4+
permalink: /add/link-repo-with-dataherb
55
exclude: true
66
comments: true
77
---
88

9+
> * Overview: [List Your Dataset on DataHerb]({{site.base_url}}/add)
10+
> * Prevous step: [Create a GitHub Repository to Host a Dataset]({{site.base_url}}/add/create-repo)
11+
912
> Before adding your github repository to DataHerb, we reqire a `.dataherb` folder and a `metadata.yml` file in your `.dataherb` folder. Please read [this tutorial: Create a GitHub Repository to Host a Dataset]({{site.base_url}}/add/create-repo).
1013
11-
### Add your GitHub repository name to DataHerb
14+
## Add your GitHub repository name to DataHerb
1215

1316
1. Go to [DataHerb/dataherb-flora](https://github.com/DataHerb/dataherb-flora).
1417
2. Create a new file inside the folder `flora`: [Use this link and click on the *Create new file* button](https://github.com/DataHerb/dataherb-flora/tree/master/flora).
@@ -23,15 +26,23 @@ comments: true
2326
- Another tag for your data
2427
```
2528

26-
For example, one could have a file with the name `ecb_currency_exchange.yml` and the following content.
29+
For example, the dataset [InterImm/dataset-planets-in-solar-system](https://github.com/InterImm/dataset-planets-in-solar-system) is represented in the flora as [planets_in_solar_system.yml](https://github.com/DataHerb/dataherb-flora/blob/master/flora/planets_in_solar_system.yml) with the following content.
2730

2831
```
29-
- name: geonames_timezone
30-
repository: datumorphism/geonames-timezones
32+
- name: Planets in the Solar System
33+
repository: InterImm/dataset-planets-in-solar-system
3134
tags:
32-
- Geo
35+
- Astronomy
3336
```
3437

3538
3. Create a new Pull Request and tell us what you have added.
3639

40+
<figure>
41+
<div>
42+
<img src="{{site.base_url}}/assets/videos/dataherb-ufo-create-new-pr.gif" type="video/gif" />
43+
</div>
44+
</figure>
45+
3746
> We do not take your data. The data page will link to your GitHub repository and the corresponding data files.
47+
48+
Loading
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Binary file not shown.
Loading
Binary file not shown.
40 MB
Loading
2.07 MB
Binary file not shown.

assets/videos/dataherb-ufo-page.gif

10.1 MB
Loading

assets/videos/dataherb-ufo-page.mp4

320 KB
Binary file not shown.

0 commit comments

Comments
 (0)