-
Notifications
You must be signed in to change notification settings - Fork 15
Darwin Core Archive Event Core
BioCollect hosts a lot of data which is stored in an internal format. This wiki page discusses how it can be transformed to Darwin Core Archive event core format. The structure of data is as follows. A citizen science project has multiple surveys (internally called project activity) which has multiple site visits (internally called activity).
BioCollect will generate an archive for each project. And, an archive will have the following files included.
- eml.xml - Project metadata is added here
- meta.xml - Describes the content of csv files
- Event.csv - All site visits (activities) are recorded here. Also, project activity is added here under
Survey
eventType. - MeasurementOrFact.csv - Contains measurements or fact from site visits.
- Media.csv - Contains images from site visits
- Occurrence.csv - Contains species occurrences.
BioCollect DwCA creator is not smart. Admin has to help BioCollect to generate DwCA correctly. This has to be done on form template of a survey. Each dataModel you like to add to DwCA has to be annotated with property dwcAttribute
and its value mapped to DwC field. It reuses the existing attributions used for record creation. An annotated example of a dataModel is given below.
{
"dataType": "text",
"name": “author”,
"dwcAttribute": "recordedBy",
"description": "The name of the person submitting this record",
"validate": "required"
}
Here, the value added to author
field is assigned DwC field recordedBy
.
Similar to the above, adding a measurement or fact is by assigning "dwcAttribute": "measurementValue"
. An example is given below. As you can see, all associated values that goes with a measurement or fact is added to the dataModel.
{
"dataType": "number",
"name": "spiValue",
"dwcAttribute": "measurementValue",
"measurementUnit": "SPI",
"measurementUnitID": "http://qudt.org/vocab/quantitykind/SPI”,
"measurementType”: “number”,
"measurementTypeID": "http://qudt.org/vocab/quantitykind/Number”,
"measurementAccuracy": "0.1",
"description": "Calculated stream pollution index (SPI)"
}
In BioCollect you can have a table of measurement or fact values. Sometimes it is desirable to have programatically create a measurement type for each of these values. Expression language to generate the name is Spring Expression Language (SpEL). Below is an example of one such case.
{
"dataType": "list",
"name": "dominantPlantSpeciesPreIntervention",
"columns": [
{
"dataType": "species",
"description": "The dominant plant species on the site at the time of commencement of the intervention works. [LIST UP TO 4 SPECIES PER STRATUM]",
"name": "dominantSpeciesPreIntervention",
"dwcAttribute": "scientificName"
},
{
"dataType": "text",
"description": "The vegetation stratum occupied by the species in it's mature state.",
"name": "dominantSpeciesPreInterventionStratum",
"constraints": [
"Canopy",
"Midstory",
"Ground stratum"
],
"dwcAttribute": "measurementValue",
"measurementUnit": "unitless",
"measurementType": "['dominantSpeciesPreIntervention']['scientificName'] + ' - Stratum'"
}
]
}
For a table with below values
Dominant Species | Stratum |
---|---|
Acacia dealbata | Canopy |
Eucalyptus tumida | Midstory |
MOF table will look like
Measurement Type | Measurement Use | ... |
---|---|---|
Acacia dealbata - Stratum | Canopy | ... |
Eucalyptus tumida - Stratum | Midstory | ... |
For use cases where data stored requires some transformation, dwcExpression
attribute can be added to data model. For example, in case where you want individualCount
added from multiple fields, you might do the following. It is again making use of Spring Expression Language.
{
"dataType": "list",
"name": "recruitment-sapling-and-seedling-count",
"isObject": true,
"columns": [
{
"dataType": "number",
"name": "juvenile_count",
"decimalPlaces": 0,
"dwcExpression": "(['juvenile_count'] == null ? 0 : ['juvenile_count']) + (['seedling_count'] == null ? 0 : ['seedling_count']) + (['sapling_count'] == null ? 0 :['sapling_count'])",
"dwcAttribute": "individualCount"
},
{
"dataType": "number",
"name": "seedling_count",
"decimalPlaces": 0
},
{
"dataType": "species",
"name": "species",
"dwcAttribute": "scientificName"
},
{
"dataType": "number",
"name": "sapling_count",
"decimalPlaces": 0
}
]
}
For a table with below values
Species | Juvenile count | Seedling count | Sampling count |
---|---|---|---|
Acacia dealbata | 1 | 0 | 5 |
Eucalyptus tumida | 7 | 8 | 10 |
Occurrence table will look like
Scientific name | Individual count | ... |
---|---|---|
Acacia dealbata | 6 | ... |
Eucalyptus tumida | 25 | ... |
DwCA file has to be accessed via ecodata - BioCollect’s back end system. The following APIs should be used.
POST https://auth.ala.org.au/cas/oidc/oidcAccessToken
Content-Type: application/x-www-form-urlencoded
Accept: application/json
grant_type=client_credentials
&scope=ecodata/read_prod
&client_id=...
&client_secret=...
URL : https://ecodata.ala.org.au/ws/record/listHarvestDataResource?max=10&offset=0&sort=asc
Header:
Authorization : Bearer <JWT token obtained from Step 1>
It will generate response like below. Use value in archiveURL property to generate file.
{
"total": 71,
"list": [
{
"projectId": "17a7871e-15cd-43a3-b349-1161778b0aed",
"name": "Superb Parrot Monitoring project",
"dataResourceId": "dr5017",
"dataProviderId": "dp3534",
"status": "active",
"alaHarvest": true,
"archiveURL": "https://ecodata.ala.org.au/ws/project/17a7871e-15cd-43a3-b349-1161778b0aed/archive"
},
………
]
}
URL : https://ecodata.ala.org.au/ws/project/17a7871e-15cd-43a3-b349-1161778b0aed/archive
Header:
Authorization : Bearer <JWT token obtained from Step 1>
Note: creating the archive can take several minutes depending on the number of activities in a project. Next phase will make it faster.