You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: add/create-repo.md
+60-58
Original file line number
Diff line number
Diff line change
@@ -6,15 +6,47 @@ exclude: true
6
6
comments: true
7
7
---
8
8
9
+
> * Overview: [List Your Dataset on DataHerb]({{site.base_url}}/add)
10
+
> * Next step: [Add Your GitHub Repository Name to DataHerb]({{site.base_url}}/add/link-repo-with-dataherb)
9
11
10
-
###Create your GitHub repository to host your data
12
+
## Create your GitHub repository to host your data
11
13
12
14
> If you have generic questions about GitHub, please [leave a comment](#comments) so we could improve this tutorial.
13
15
16
+
> If you prefer to learn from examples, simple copy everything from this repo [InterImm/dataset-planets-in-solar-system](https://github.com/InterImm/dataset-planets-in-solar-system) and adapt.
17
+
18
+
### If you prefer to use the web interface of GitHub
19
+
14
20
1. Go to [github.com and click on the + on the top right](https://github.com/new)
15
21
2. Create a repository for your data. [GitHub Help](https://help.github.com/en/github/getting-started-with-github/create-a-repo)
16
-
3. Create a `.dataherb` folder in the root of your repository.
17
-
4. Place your data files in a folder such as `dataset`.
3. Create a folder to hold your data file, in this example, we will create a fold called `dataset`. Click on the `Create new file` button, and type in `dataset/.githold`. This will creae a folder called `dataset` and place a file called `.githold` inside it.
Upload your data file into this folder by clicking on button `Upload files`.
39
+
40
+
4. Create a `.dataherb` folder in the root of your repository. Now the folder structure should be
41
+
42
+
```
43
+
.
44
+
├── README.md
45
+
├── .dataherb
46
+
├── dataset
47
+
└── your_data_file
48
+
```
49
+
18
50
5. Create a file `metadata.yml` in the `.dataherb` with the following content:
19
51
20
52
```
@@ -24,6 +56,17 @@ comments: true
24
56
- name: [Name of the the first contributor]
25
57
data:
26
58
- name: [name of your data file, optional]
59
+
description: [description of your data file, optional]
60
+
path: [path_to_your_data_file.csv]
61
+
format: csv
62
+
size: [size of your data file]
63
+
fields:
64
+
- name: [name of the first colomn]
65
+
description: [description of the first column]
66
+
- name: [name of the second colomn]
67
+
description: [description of the second column]
68
+
- name: [name of your second data file, optional]
69
+
description: [description of your second data file, optional]
27
70
path: [path_to_your_data_file.csv]
28
71
format: csv
29
72
size: [size of your data file]
@@ -32,67 +75,26 @@ comments: true
32
75
description: [description of the first column]
33
76
- name: [name of the second colomn]
34
77
description: [description of the second column]
78
+
license:
79
+
- name: [Name of the license of the dataset]
80
+
link: [Link to the license page]
35
81
references:
36
82
- name: [Name of the first reference]
37
83
link: [https://link_to_your_first_reference]
38
84
```
39
85
40
-
For example, the folder structure in this demo project `datumorphism/geonames-timezones` is
86
+
> As an example, one could use similar contents as in this demo project [InterImm/dataset-planets-in-solar-system](https://github.com/InterImm/dataset-planets-in-solar-system/blob/master/.dataherb/metadata.yml).
41
87
42
-
```
43
-
.
44
-
├── README.md
45
-
├── dataset
46
-
│ ├── geonames_timezone.csv
47
-
│ └── geonames_timezone.json
48
-
└── .dataherb
49
-
└── metadata.yml
50
-
```
51
88
52
-
The `.dataherb/metadata.yml` file has the following content.
89
+
### If you prefer to use the command line
53
90
54
-
```
55
-
name: Geoname Timezones
56
-
description: IANA Timezone IDs in different countries from Geonames
57
-
contributors:
58
-
- name: Datumorphism
59
-
github: datumorphism
60
-
data:
61
-
- path: dataset/geonames_timezone.csv
62
-
name: timezones in csv format
63
-
format: csv
64
-
size: 14K
65
-
updated_at: "2020-02-12"
66
-
fields:
67
-
- name: country_code
68
-
description: Alpha 2 country code
69
-
- name: timezone_id
70
-
description: IANA timezone id, www.iana.org
71
-
- name: gmt_offset
72
-
description: GMT offset in January 1st
73
-
- name: dst_offset
74
-
description: Day light saving offset in July 1st
75
-
- name: raw_offset
76
-
description: Raw offset, independant of DST
77
-
- path: dataset/geonames_timezone.json
78
-
format: json
79
-
size: 58K
80
-
updated_at: "2020-02-12"
81
-
fields:
82
-
- name: country_code
83
-
description: Alpha 2 country code
84
-
- name: timezone_id
85
-
description: IANA timezone id, www.iana.org
86
-
- name: gmt_offset
87
-
description: GMT offset in January 1st
88
-
- name: dst_offset
89
-
description: Day light saving offset in July 1st
90
-
- name: raw_offset
91
-
description: Raw offset, independant of DST
92
-
references:
93
-
- name: Geonames Download Server
94
-
link: https://download.geonames.org/export/dump/
95
-
```
96
-
6. Add, Commit, and Push the contents to GitHub.
91
+
If you use command line for git, the precess is more or less the same. However, there is at least one advantage of using the command line. We have created a command line tool to help you generate the metadata.
92
+
93
+
1. Install the [dataherb python package](https://pypi.org/project/dataherb/): `pip install dataherb`.
94
+
2. Place the data files in a folder like `datasets`.
95
+
3. In the root folder of your repo, use the command `dataherb create` and follow the guidlines.
96
+
97
+
98
+
## Next Step
97
99
98
-
> After creating the repository, you could add your dataset to DataHerb index very easily. Please read [this tutorial: Add Your GitHub Repository Name to DataHerb]({{site.base_url}}/add/add-repo-to-dataherb).
100
+
> After creating the repository, you could link your dataset with DataHerb index very easily. Please read [the Next Step: Add Your GitHub Repository Name to DataHerb]({{site.base_url}}/add/link-repo-with-dataherb).
Copy file name to clipboardExpand all lines: add/index.md
+17-116
Original file line number
Diff line number
Diff line change
@@ -6,133 +6,34 @@ exclude: true
6
6
comments: true
7
7
---
8
8
9
-
Adding your dataset is free and easy. You manage your datasets, DataHerb makes it easy to be accessed.
9
+
Adding your dataset is free and easy. Simply link your datasets with DataHerb and your dataset will be indexed.
10
10
11
11
> Datasets that can be used to enhance machine learning datasets are the priorities at the moment. These datasets can be very helpful to the open data community as well as all data scientists/engineers.
12
12
13
13
## List Your Dataset on DataHerb
14
14
15
15
It only takes **two steps**:
16
16
17
-
1.[Create your GitHub repository to host your data.](#create-your-github-repository-to-host-your-data)
18
-
2.[Add your GitHub repository name to DataHerb.](#add-your-github-repository-name-to-dataherb)
17
+
1.[Create your GitHub repository to host your data.]({{site.base_url}}/add/create-repo)
18
+
2.[Add your GitHub repository name to DataHerb.]({{site.base_url}}/add/link-repo-with-dataherb)
19
19
20
20
Everything else will be done automatically by GitHub Actions.
21
21
22
-
### Create your GitHub repository to host your data
22
+
##What will happen
23
23
24
-
> If you have generic questions about GitHub, please [leave a comment](#comments) so we could improve this tutorial.
24
+
After listing your dataset on DataHerb:
25
25
26
-
1. Go to [github.com and click on the + on the top right](https://github.com/new)
27
-
2. Create a repository for your data. [GitHub Help](https://help.github.com/en/github/getting-started-with-github/create-a-repo)
28
-
3. Create a `.dataherb` folder in the root of your repository.
29
-
4. Place your data files in a folder such as `dataset`.
30
-
5. Create a file `metadata.yml` in the `.dataherb` with the following content:
2. One could use the dataset easily. For example, one could copy & paste the python code to load the data in python. The following is an example to use the dataset in google spreadsheet.
51
34
52
-
For example, the folder structure in this demo project `datumorphism/geonames-timezones` is
53
-
54
-
```
55
-
.
56
-
├── README.md
57
-
├── dataset
58
-
│ ├── geonames_timezone.csv
59
-
│ └── geonames_timezone.json
60
-
└── .dataherb
61
-
└── metadata.yml
62
-
```
63
-
64
-
The `.dataherb/metadata.yml` file has the following content.
65
-
66
-
```
67
-
name: Geoname Timezones
68
-
description: IANA Timezone IDs in different countries from Geonames
69
-
contributors:
70
-
- name: Datumorphism
71
-
github: datumorphism
72
-
data:
73
-
- path: dataset/geonames_timezone.csv
74
-
name: timezones in csv format
75
-
format: csv
76
-
size: 14K
77
-
updated_at: "2020-02-12"
78
-
fields:
79
-
- name: country_code
80
-
description: Alpha 2 country code
81
-
- name: timezone_id
82
-
description: IANA timezone id, www.iana.org
83
-
- name: gmt_offset
84
-
description: GMT offset in January 1st
85
-
- name: dst_offset
86
-
description: Day light saving offset in July 1st
87
-
- name: raw_offset
88
-
description: Raw offset, independant of DST
89
-
- path: dataset/geonames_timezone.json
90
-
format: json
91
-
size: 58K
92
-
updated_at: "2020-02-12"
93
-
fields:
94
-
- name: country_code
95
-
description: Alpha 2 country code
96
-
- name: timezone_id
97
-
description: IANA timezone id, www.iana.org
98
-
- name: gmt_offset
99
-
description: GMT offset in January 1st
100
-
- name: dst_offset
101
-
description: Day light saving offset in July 1st
102
-
- name: raw_offset
103
-
description: Raw offset, independant of DST
104
-
references:
105
-
- name: Geonames Download Server
106
-
link: https://download.geonames.org/export/dump/
107
-
```
108
-
6. Add, Commit, and Push the contents to GitHub.
109
-
110
-
111
-
### Add your GitHub repository name to DataHerb.
112
-
113
-
114
-
1. Go to [DataHerb/dataherb-flora](https://github.com/DataHerb/dataherb-flora).
115
-
2. Create a new file inside the folder `flora`: [Use this link and click on the *Create new file* button](https://github.com/DataHerb/dataherb-flora/tree/master/flora).
116
-
117
-
The file name should be descriptive. Do not use spaces in the file name. The file should contain
118
-
119
-
```
120
-
- name: Name of the dataset
121
-
repository: repository_owner/repository_name
122
-
tags:
123
-
- A tag your data
124
-
- Another tag for your data
125
-
```
126
-
127
-
For example, one could have a file with the name `ecb_currency_exchange.yml` and the following content.
128
-
129
-
```
130
-
- name: geonames_timezone
131
-
repository: datumorphism/geonames-timezones
132
-
tags:
133
-
- Geo
134
-
```
135
-
136
-
3. Create a new Pull Request and tell us what you have added.
137
-
138
-
> We do not take your data. The data page will link to your GitHub repository and the corresponding data files.
Copy file name to clipboardExpand all lines: add/link-repo-with-dataherb.md
+17-6
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,17 @@
1
1
---
2
2
layout: page
3
3
title: Add Your GitHub Repository Name to DataHerb
4
-
permalink: /add/add-repo-to-dataherb
4
+
permalink: /add/link-repo-with-dataherb
5
5
exclude: true
6
6
comments: true
7
7
---
8
8
9
+
> * Overview: [List Your Dataset on DataHerb]({{site.base_url}}/add)
10
+
> * Prevous step: [Create a GitHub Repository to Host a Dataset]({{site.base_url}}/add/create-repo)
11
+
9
12
> Before adding your github repository to DataHerb, we reqire a `.dataherb` folder and a `metadata.yml` file in your `.dataherb` folder. Please read [this tutorial: Create a GitHub Repository to Host a Dataset]({{site.base_url}}/add/create-repo).
10
13
11
-
###Add your GitHub repository name to DataHerb
14
+
## Add your GitHub repository name to DataHerb
12
15
13
16
1. Go to [DataHerb/dataherb-flora](https://github.com/DataHerb/dataherb-flora).
14
17
2. Create a new file inside the folder `flora`: [Use this link and click on the *Create new file* button](https://github.com/DataHerb/dataherb-flora/tree/master/flora).
@@ -23,15 +26,23 @@ comments: true
23
26
- Another tag for your data
24
27
```
25
28
26
-
For example, one could have a file with the name `ecb_currency_exchange.yml` and the following content.
29
+
For example, the dataset [InterImm/dataset-planets-in-solar-system](https://github.com/InterImm/dataset-planets-in-solar-system) is represented in the flora as [planets_in_solar_system.yml](https://github.com/DataHerb/dataherb-flora/blob/master/flora/planets_in_solar_system.yml) with the following content.
0 commit comments