This project contains datasets for the Automated Systematic Review project. This repository is used to collect, preprocess and share datasets on Systematic Review.
The datasets are alphabetically ordered.
Reference | Topic | Sample Size | Inclusion | Link | License |
---|---|---|---|---|---|
Cohen et al., 2006 | ACEInhibitors | 2544 | 1.61% | source | NA |
Cohen et al., 2006 | ADHD | 851 | 2.35% | source | NA |
Cohen et al., 2006 | Antihistamines | 310 | 5.16% | source | NA |
Cohen et al., 2006 | Atypical Antipsychotics | 1120 | 13.04% | source | NA |
Cohen et al., 2006 | Beta Blockers | 2072 | 2.03% | source | NA |
Cohen et al., 2006 | Calcium Channel Blockers | 1218 | 8.21% | source | NA |
Cohen et al., 2006 | Estrogens | 368 | 21.74% | source | NA |
Cohen et al., 2006 | NSAIDS | 393 | 10.43% | source | NA |
Cohen et al., 2006 | Opiods | 1915 | 0.78% | source | NA |
Cohen et al., 2006 | Oral Hypoglycemics | 503 | 27.04% | source | NA |
Cohen et al., 2006 | Proton Pump Inhibitors | 1333 | 3.83% | source | NA |
Cohen et al., 2006 | Skeletal Muscle Relaxants | 1643 | 0.55% | source | NA |
Cohen et al., 2006 | Statins | 3465 | 2.45% | source | NA |
Cohen et al., 2006 | Triptans | 671 | 3.58% | source | NA |
Cohen et al., 2006 | Urinary Incontinence | 327 | 12.23% | source | NA |
Van de Schoot et al., 2018 | PTSD | 5783 | 0.66% | source | |
Wahono, 2015 | Software Defect Detection | 7002 | 0.89% | source | Creative Commons Attribution 4.0 International |
Hall et al., 2012 | Software Fault Prediction | 8911 | 1.17% | source | Creative Commons Attribution 4.0 International |
Radjenović et al., 2013 | Software Fault Prediction | 6000 | 0.80% | source | Creative Commons Attribution 4.0 International |
Kitchenham et al., 2010 | Software Engineering | 1704 | 2.58% | source | Creative Commons Attribution 4.0 International |
Bannach-Brown et al., 2019 | Animal Model of Depression | 1993 | 14.0% | source | Creative Commons Attribution 4.0 International |
The folder datasets/
has a subfolder for the different Systematic Reviews
datasets. Each of these subfolders are little project. They contain code and a
README.md
. The scripts in the different dataset folder create a subfolder
named output/
with the result of the data collection.
The [Automated Systematic Review](https://github.com/msdslab/automated- systematic-review) software accepts several file formats like RIS and CSV. The datasets in this project are stored in one of these formats.
RIS files are used by
digital libraries, like IEEE Xplore, Scopus and ScienceDirect. Citation
managers Mendeley and EndNote support the RIS format as well. For simulation,
we use an additional RIS tag with the letters LI
(Label included).
For CSV files, the software accepts a set of predetermined labels in line with the ones used in RIS files. The most commonly used ones are: "id", "authors", "date", "title", "keywords" and "abstract". To indicate labelling decisions, one can use "included" or "label_included".
In general, the following column names are recognized (based on https://pypi.org/project/RISparser/):
first_authors
secondary_authors
tertiary_authors
subsidiary_authors
abstract
author_address
accession_number
authors
custom1
custom2
custom3
custom4
custom5
custom6
custom7
custom8
caption
call_number
place_published
date
name_of_database
doi
database_provider
end_page
end_of_reference
edition
id
number
alternate_title1
alternate_title2
alternate_title3
journal_name
keywords
file_attachments1
file_attachments2
figure
language
label
note
type_of_work
notes
abstract
number_of_Volumes
original_publication
publisher
year
reviewed_item
research_notes
reprint_edition
version
issn
start_page
short_title
primary_title
secondary_title
tertiary_title
translated_author
title
translated_title
type_of_reference
unknown_tag
url
volume
publication_year
access_date
The custom tag is:
label_included
Contact details can be found at the Automated Systematic Review project page.