(If you don't really know what to look for when reviewing a PR)
When browsing the pull requests, watch out for the Ready for review
tag; here reviews are needed. If you come across Reviews to address
there has already been at least one review, and some changes have to be done to the code. You could still review, or wait until this label disappears, or you address the suggestions yourself. In the latter case, do communicate this on slack and in the pull request comments, to avoid several people getting in each other's way.
At least 2 independent reviews are required before merging a PR.
-
Try and follow the instructions as they are to run the pipeline (using the test data in
tests/test_data
) (Most workflows should have configs and sample-sheets set up to use this data anyway). Request improvement of the README if you can't run the method workflow with the given instructions. The workflow should run through on the test data without any errors, else please report those errors. However, if it has been documented in the README already that/why the method doesn't work on the test data, and sufficient proof that it works on larger datasets is given (ideally with links to that data), you can still approve. Note that "method doesn't work with ensembl/provided file format/etc." is NOT a valid reason. Those issues have to be adressed. -
Note anything else that’s unclear or missing, for example concerning
- Format of required input data
- Layout of sample sheet and what the columns mean (there should be an example)
- Output files produced (names, what they contain)
- links to the original repo/README/publication of the participant
Refer to the specifications.
- Should be uploaded to biocontainers, not on personal accounts (see method workflow README on Containers)
-
Stages of pipeline should (ideally) do one thing
- E.g. separate scripts to run method and to generate APAeval output files
-
No hardcoding of parameters or output filenames (make them adjustable via a config file)
-
Sample sheet should contain sample specific info, pipeline specific info should go in config
-
Custom annotation files should be generated by the workflow and not be downloaded (e.g. from UCSC Genome Browser, method’s Github repo etc.)
- If ‘gene model’ or BED12 format required, use the ‘GTF to BED12’ utility in the utils/ folder
-
Output files from test data shouldn’t be committed with the repository
-
Make sure (custom) code is clearly documented with comments explaining functions etc.