Skip to content

Data preparation and cleanup

Alberto Cottica edited this page May 18, 2016 · 1 revision

Data have been downloaded in CSV format from the European Commission's open data portal at the end of April 2016.

Preliminary cleanup has been performed by Stefano Durì using Kettle. Specifically, we:

  • Deduplicated organisation names and assigned to each one a unique ID. The data contain a field for organisations' PICs, but the field is empty in many records. Unique IDs have been generated when PICs were not available.
  • Created three files. The first one, with information about projects; the second one, with information about organisations; and the the third one, containing the edges, i.e. a list of {organisation: project} pairs.

The above was performed both on FP7 and H2020 data.

Clone this wiki locally