PR review guide for method workflows

(If you don't really know what to look for when reviewing a PR)

When browsing the pull requests, watch out for the Ready for review tag; here reviews are needed. If you come across Reviews to address there has already been at least one review, and some changes have to be done to the code. You could still review, or wait until this label disappears, or you address the suggestions yourself. In the latter case, do communicate this on slack and in the pull request comments, to avoid several people getting in each other's way.
At least 2 independent reviews are required before merging a PR.

Start with the README of the method workflow

Try and follow the instructions as they are to run the pipeline (using the test data in tests/test_data) (Most workflows should have configs and sample-sheets set up to use this data anyway). Request improvement of the README if you can't run the method workflow with the given instructions. The workflow should run through on the test data without any errors, else please report those errors. However, if it has been documented in the README already that/why the method doesn't work on the test data, and sufficient proof that it works on larger datasets is given (ideally with links to that data), you can still approve. Note that "method doesn't work with ensembl/provided file format/etc." is NOT a valid reason. Those issues have to be adressed.
Note anything else that’s unclear or missing, for example concerning
- Format of required input data
- Layout of sample sheet and what the columns mean (there should be an example)
- Output files produced (names, what they contain)
- links to the original repo/README/publication of the participant

Check the output data matches our specifications

Refer to the specifications.

Check that steps to run methods/pre-processing are wrapped in Dockerfiles

Should be uploaded to biocontainers, not on personal accounts (see method workflow README on Containers)

Code organisation & general comments

Stages of pipeline should (ideally) do one thing
- E.g. separate scripts to run method and to generate APAeval output files
No hardcoding of parameters or output filenames (make them adjustable via a config file)
Sample sheet should contain sample specific info, pipeline specific info should go in config
Custom annotation files should be generated by the workflow and not be downloaded (e.g. from UCSC Genome Browser, method’s Github repo etc.)
- If ‘gene model’ or BED12 format required, use the ‘GTF to BED12’ utility in the utils/ folder
Output files from test data shouldn’t be committed with the repository
Make sure (custom) code is clearly documented with comments explaining functions etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!