Skip to content

Conversation

@alkaZeltser
Copy link
Collaborator

This PR prepares the package for v3.0.0 release & for publication on CRAN.

  • Increment version in DESCRIPTION and NEWS
  • Address minor R CMD Check NOTE
  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added the changes included in this pull request to NEWS under the next release version or unreleased, and updated the date.

  • I have updated the version number in metadata.yaml and DESCRIPTION.

  • Both R CMD build and R CMD check run successfully.

@alkaZeltser
Copy link
Collaborator Author

@dan-knight as discussed in person, R CMD check returns a NOTE that will be difficult to fix but possible to justify to the CRAN team:

─  checking R code for possible problems ... [11s/12s] NOTE (12.2s)
   combine.vcf.with.pgs: no visible binding for global variable ‘ID’
   combine.vcf.with.pgs: no visible global function definition for ‘.’
   combine.vcf.with.pgs: no visible binding for global variable ‘Indiv’
   combine.vcf.with.pgs: no visible binding for global variable ‘CHROM’
   combine.vcf.with.pgs: no visible binding for global variable ‘POS’
   combine.vcf.with.pgs: no visible binding for global variable ‘V1’
   combine.vcf.with.pgs: no visible binding for global variable ‘new.ID’
   combine.vcf.with.pgs: no visible global function definition for ‘:=’
   Undefined global functions or variables:
     . := CHROM ID Indiv POS V1 new.ID

This is triggered by a small chunk of code in combine.vcf.with.pgs which implements data.table using the corresponding syntax. R CMD check incorrectly suggests that there are unbound variables being used. In reality, it is just data.table syntax which uses variable names without quotes.

        # Split VCF$ID column into separate rows for each rsID (multiple rsIDs are separated by ;)
        # most efficient way to do this is to use the data.table package
        if (any(grepl(';', vcf.data$ID))) {
            data.table::setDT(vcf.data);
            split.rsid.vcf.data <- merge(
                x = vcf.data,
                # split only entries with multiple rsIDs, save in new column, and merge back with the original data
                y = vcf.data[grepl(';', ID), unique(unlist(strsplit(as.character(ID), ';', fixed = TRUE))), by = .(Indiv, CHROM, POS)
                    ][,.(new.ID = V1, Indiv, CHROM, POS)],
                by = c('CHROM', 'POS', 'Indiv'),
                all = TRUE
                );
            # replace entries with multiple rsIDs with the new, split, rsID
            split.rsid.vcf.data <- split.rsid.vcf.data[!is.na(new.ID), ID := new.ID][, new.ID := NULL];
            } else {
            split.rsid.vcf.data <- vcf.data;
            }

The data.table implementation of this particular functionality (conditionally splitting values into new rows) is the most efficient and straightforward that I could identify. I don't think it's worth spending time re-coding this in base R. Since this will be the first version of our package to be uploaded to CRAN, we will need be undergoing manual review and will need to convince the CRAN team to allow this.

Copy link

@rachelmadang rachelmadang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alkaZeltser alkaZeltser merged commit ccc803e into main Dec 6, 2024
6 checks passed
@alkaZeltser alkaZeltser deleted the nzeltser-release-3.0.0 branch December 6, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants