Skip to content

2 Data Model Staging

Saif Shabou edited this page May 17, 2021 · 2 revisions

Staging data model

Data model

Staging data corresponds to the first step fro standardizing ghg emissiosn data. Every dataset is stored in one file composed of n lines, in a way that each line refers to an emission object with respect to a json structure. The goal here is to store as much as possible information with respect to a normalized schemas while keeping flexibility when data soruces provide specific information.

We may fill this excel file to store mapping information for each datasource: https://docs.google.com/spreadsheets/d/1CnTpHjZZZepgJ1o1VuQUN61ZaLLRtM1OhZzJU9HCPaY/edit#gid=2038606787

An emission object is composed of 4 main components:

  • data_source: Information related to data sources such as name, link, description... Data-source specific attributes may be stored in property object inside data_source.
  • geo_component: Information describing the geographical entity where the emission is referring to such as name, iso-code, geo-scale... Data-source specific attributes relate to geographical entity may be stored in property object inside geo_component.
  • date: The date of the reported emission. For yearly emission, set the day and month to be the first of January.
  • emission: Information describing the emission characteristics: gas, sector, unit, value.
{
	{
		"data_source": {
			"name": "gcp",
			"link": "url",
                        "properties": {
                                "description": "This is a short description of data source",
                                "provider": "gcp"
                        }
		},
		"geo_component": {
			"scale": "Country",
			"name": "france",
			"identifier": {
				"id": "FRA",
				"type": "alpha3"
			},
                        "properties": {
                                "data_source_code": "FRA"
                        }
		},
		"date": "2011-01-01",
		"emission": {
			"gas": "co2",
			"value": 624.0,
			"unit": {
				"unit_used": "Mt co2eq"
			},
			"sector":{
				"sector_origin_name": "Coal",
				"sector_mapped_name": "fossil_emissions_coal"
			}
		}
	},
	{
		"data_source": {
			"name": "gcp",
			"link": "url",
		},
		"geo_component": {
			"scale": "Country",
			"name": "france",
			"identifier": {
				"id": "FRA",
				"type": "alpha3"
			}
		},
		"date": "2012-01-01",
		"emission": {
			"gas": "co2",
			"value": 624.0,
			"unit": {
				"unit_used": "Mt co2eq"
			},
			"sector":{
				"sector_origin_name": "Coal",
				"sector_mapped_name": "fossil_emissions_coal"
			}
		}
	}
}
field field field Obligatory Type Description Values
data_source Yes json object containing information related to the data source
. name Yes string The name of the data sources as defined in the repository structure (e.g. "wri", "cdp"...) wri, gcp, cdp
. link No string Http link to the datasource
. properties No json object containing specific data source properties
. . scenario No The used scenario for filling empty emission values
. . description No Short description of data source
geo_component Yes json object containing information related to the geographical entity
. scale Yes string Spatial resolution of considered geocomponent based on a defined list country-group; country; city; grid
. name No string The name of the geocomponent in lower case france; italy
. identifier Yes string json object containing pricipal identifier of the geo-component
. . id Yes string The geocomponent identifier used in the dataset (for exemple: FRA, FR, france)
. . type Yes string The type of identifier. Values should besenected from: alpha3, alpha2 or name alpha3 alpha2 name
. properties No string json object containing specific geo-component properties
. . datasource_code No string The identifier of the geocomponent in the datasource reference system Account Number
date Yes date The date of emissions reporting
emission Yes json object containing emissions values information
. sector Yes (unless it is a city emission)
. . sector_origin_name Yes (unless it is a city emission) string The sector name as mentioned in the raw data source
. . sector_mapped_name No enum Sector name based on the sector modalities mapping table
. scope Yes (only for city emission)
. . scope_origin_name Yes (only for city emission) string The scope name as mentioned in the raw data source
. . scope_mapped_name No enum scope name based on the sector modalities mapping table
. gas Yes enum Gas name based on gas modalities table
. value Yes numeric value of gas emisssion quantity as provided by the datasource
. unit Yes
. . unit used Yes string unit used for quantifying ghg emissions
. . gwp_report_reference No string scope name based on the sector modalities mapping table
Clone this wiki locally