Skip to content

a collection of documentation, tools and processes for generating, approving & releasing synthetic data in UCLH

Notifications You must be signed in to change notification settings

SAFEHR-data/uclh-synthetic-data

Repository files navigation

Generating synthetic data in UCLH

Outline

This repository is a start towards creating a home for the processes and tools to generate and release synthetic hospital data at UCLH. It is work in progress and will change. This documentation will also be synched to Slab where it will look prettier.

What are synthetic data ?

Synthetic data are artificially generated rather than referring to a particular individual or event. Synthetic data reduce the risk of releasing information about individual patients. See these good introductory resources about synthetic data.

Not all synthetic data are the same. They range from low fidelity data that are only broadly similar to real data to high fidelity that more closely resemble them.

What can synthetic data be used for ?

Different fidelities of synthetic data have different uses. Lower fidelity synthetic data can, counter-intuitively, be more useful because they have a lower risk of releasing sensitive information and can be made more openly available.

These are some of the uses of synthetic data :

  1. Code development
  2. Training & teaching
  3. Data discovery
  4. Federation (doing studies on our own real data using code that others have developed on our synthetic data)

Generating synthetic data in UCLH

In UCLH we are developing tools and processes to generate and release synthetic data safely. Initially UCLH will only be releasing low fidelity synthetic data.

In short this will involve :

  • summarising attributes of real data
  • using these to generate synthetic data
  • ensuring that the synthetic data contain no sensitive information

Follow this link for more detail about the developing plan for the process of generating synthetic data in UCLH.

About

a collection of documentation, tools and processes for generating, approving & releasing synthetic data in UCLH

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published