-
Notifications
You must be signed in to change notification settings - Fork 0
Data preparation and cleanup
Alberto Cottica edited this page May 18, 2016
·
1 revision
Data have been downloaded in CSV format from the European Commission's open data portal at the end of April 2016.
Preliminary cleanup has been performed by Stefano Durì using Kettle. Specifically, we:
- Deduplicated organisation names and assigned to each one a unique ID. The data contain a field for organisations' PICs, but the field is empty in many records. Unique IDs have been generated when PICs were not available.
- Created three files. The first one, with information about projects; the second one, with information about organisations; and the the third one, containing the edges, i.e. a list of
{organisation: project}
pairs.
The above was performed both on FP7 and H2020 data.