Background

The pharmaceutical ecosystem is seeing rapid growth of open-source R tools and AI-enabled applications across clinical development, analysis, and regulatory submissions.

Objective benchmarking and evaluation of these tools is currently constrained by the lack of high-quality, publicly available clinical trial datasets.

Existing public datasets (e.g., CDISC Pilot 1) are:

Outdated and not aligned with current CDISC standards or industry practices
Limited in scale and complexity
Insufficient for evaluating modern workflows, including AI-assisted analysis, automation, and end-to-end submissions

There is a clear need for modern, realistic, and reusable synthetic clinical trial data that can support:

Tool demonstration
Method evaluation
Community development and education

Development phase 1

Use CDISC Pilot 1 data as reference

Original sdtm: json version
Original xpt versions: see pilot 5 repo
CSR: https://github.com/cdisc-org/sdtm-adam-pilot-project/blob/master/updated-pilot-submission-package/900172/m5/53-clin-stud-rep/535-rep-effic-safety-stud/5351-stud-rep-contr/cdiscpilot01/cdiscpilot01.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cart-t		cart-t
cdiscpilot1_simulation		cdiscpilot1_simulation
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Development phase 1

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

RConsortium/submissions-pilot7-synthetic-data

Folders and files

Latest commit

History

Repository files navigation

Background

Development phase 1

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages