The pharmaceutical ecosystem is seeing rapid growth of open-source R tools and AI-enabled applications across clinical development, analysis, and regulatory submissions.
Objective benchmarking and evaluation of these tools is currently constrained by the lack of high-quality, publicly available clinical trial datasets.
Existing public datasets (e.g., CDISC Pilot 1) are:
- Outdated and not aligned with current CDISC standards or industry practices
- Limited in scale and complexity
- Insufficient for evaluating modern workflows, including AI-assisted analysis, automation, and end-to-end submissions
There is a clear need for modern, realistic, and reusable synthetic clinical trial data that can support:
- Tool demonstration
- Method evaluation
- Community development and education
Use CDISC Pilot 1 data as reference
- Original sdtm: json version
- Original xpt versions: see pilot 5 repo
- CSR: https://github.com/cdisc-org/sdtm-adam-pilot-project/blob/master/updated-pilot-submission-package/900172/m5/53-clin-stud-rep/535-rep-effic-safety-stud/5351-stud-rep-contr/cdiscpilot01/cdiscpilot01.pdf