-
Notifications
You must be signed in to change notification settings - Fork 3
Tips on data collection
Tips on data collection
KDL encourages the use of open data formats and standards.
The size and nature of the data, its diversity, provenance, licensing, compatibility of the input format with our software stack and existing workflows, as well as the degree of certainty, fuzziness, and interpretation in the data are all factors that affect the analysis KDL conducts and our recommendations in terms of data collection. During feasibility assessment (see below) as well as after a project started, in depth conversations with partners will inform data collection.
However, if a project hasn’t started yet, there are some general rules we could point to follow when creating a dataset. While each research context requires bespoke analysis, here are some minimal tips KDL recommends you to follow:
Separate any fields you would like to conduct any analysis on (e.g. if surnames are a unit of analysis, make sure they are recorded in separate fields or columns distinct from first names). As a general rule, use one value per field; if multiple values, place them in separate fields or the comment field.
Be consistent with the format of the value in each field (e.g. dates); please use a clear convention for range of uncertainty and apply that consistently. Same applies to distinction between missing or ineligible values.
Minimise editorial interventions at collection stage unless agreed (e.g. don’t modernise names if later the analysis might need to track back to the original place or person names, but provide separate field(s) for modernisation to fill in at a later stage or during data collection as appropriate).
If entity identification is relevant, create a field to start matching and facilitate merging (e.g. people with same names recorded as the same person or a different one).
Use notes and comments fields to record uncertainty or any comments on data collections that could be relevant to refine the data model later on (e.g. indication of possible names to merge or to separate).
Make sure all records link with enough precision to the location (as locus of occurrence) in the source material if relevant
If the plan is to analyse the dataset via the creation of maps, record geo-coordinates for known places if feasible.
See also:
Software Development Life Cycle. King's Digital Lab. 2025
-
- A2: Terms of Reference guidance
- B2: Project Approach Questionnaire guidance
- F2: Feasibility guidance
- I2: Product Quote guidance
- J2: Statement of Work guidance
- Data Management Plan guidance and AHRC template
- L2: Project Review Record guidance
- N2: Web Hosting and Infrastructure Service Level Agreement (SLA)
- Q2: Decommissioning Authorisation guidance
-
Monitoring Methods - In progress
- Z1: RSE Team Mission and Activities
- V1: How do we approach analysis when a project is funded
- Meetings
- Peer review
- Task management
- Budgeting and resource planning
-
Scenarios and examples - In progress
-
Other useful documents
- KDL Guidance to project partners questionnaire
- SUP Amenability to archiving assessment
- KDL Checklist for Digital Outputs Assessment in the REF
- DSDM Agile Project Framework Handbook
- FAIR Guiding Principles for scientific data management and stewardship
- KDL HR Roles
- AHRC Data Management Plan (external)
- UK Dataservice Data Management Checklist
- Tips on data collection