Skip to content

Conversation

@ppaxisa
Copy link
Member

@ppaxisa ppaxisa commented Mar 13, 2025

I rewrote the group mutate to match previous remarks discussed in #109 , notably avoid breaking changes when S4 columns are present in metadata. I've written a new version that uses plyranges methods when S4 columns are present and prints a message to inform user of potential speed up if coercion to S3:

mutate.GroupedGenomicRanges <- function(.data, ...) {
  
  dots <- set_dots_named(...)
  check_colnames(names(dots))
  core_cols <- names(dots) %in% c("start", "end", "width", "seqnames", "strand")
  
  # if any S4 columns in mcols use plyranges group mutate
  if(any(sapply(mcols(.data), isS4))) {
    message("metadata contains S4 columns, using plyranges group operations.\nYou can speed up group operations by coercing metadata columns to S3")
    return(mutate_grp(.data, dots))
  }
  # PPA grouped mutate speedup, 2025
  # if mutating core columns, call plyranges group for those
  if(any(core_cols)) {
    .data <- mutate_grp(.data, dots[core_cols])
  }
  # if only core columns in call, return object
  if(all(core_cols)) {
    return(.data)
  } else {
    # use vanilla dplyr group operations on metadata
    mutate_mcols_grp(.data, dots[!core_cols])
  }
}

For more details, the changes are done in a new branch here

I also added some tests to check that S4 columns are preserved if present. This version passes checks:

==> devtools::check(document = FALSE)

══ Building ════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Setting env vars:CFLAGS    : -Wall -pedantic -fdiagnostics-color=alwaysCXXFLAGS  : -Wall -pedantic -fdiagnostics-color=alwaysCXX11FLAGS: -Wall -pedantic -fdiagnostics-color=alwaysCXX14FLAGS: -Wall -pedantic -fdiagnostics-color=alwaysCXX17FLAGS: -Wall -pedantic -fdiagnostics-color=alwaysCXX20FLAGS: -Wall -pedantic -fdiagnostics-color=always
── R CMD build ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
✔  checking for file/Volumes/workSD1/Pierre-Paul/CRCT/PPA_work/project/plyranges/DESCRIPTION...preparingplyranges: (421ms)
✔  checking DESCRIPTION meta-information ...installing the package to build vignettescreating vignettes (56.4s)
─  checking for LF line-endings in source and make files and shell scripts (886ms)
─  checking for empty or unneeded directories
   Removed empty directoryplyranges/pkgdown’
─  buildingplyranges_1.23.2.tar.gz’
   
══ Checking ════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Setting env vars:_R_CHECK_CRAN_INCOMING_REMOTE_               : FALSE_R_CHECK_CRAN_INCOMING_                      : FALSE_R_CHECK_FORCE_SUGGESTS_                     : FALSE_R_CHECK_PACKAGES_USED_IGNORE_UNUSED_IMPORTS_: FALSENOT_CRAN                                     : true
── R CMD check ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
─  using log directory/Volumes/workSD1/Pierre-Paul/CRCT/PPA_work/project/plyranges.Rcheck’ (360ms)
─  using R version 4.4.1 (2024-06-14)
─  using platform: x86_64-apple-darwin20R was compiled by
       Apple clang version 14.0.0 (clang-1400.0.29.202)
       GNU Fortran (GCC) 12.2.0running under: macOS Ventura 13.6.9using session charset: UTF-8using options--no-manual --as-cran’ (621ms)
✔  checking for fileplyranges/DESCRIPTION’
─  checking extension type ... Packagethis is packageplyrangesversion1.23.2’
─  package encoding: UTF-8checking package namespace information ...checking package dependencies (2.2s)
✔  checking if this is a source packagechecking if there is a namespacechecking for executable files (2.4s)
✔  checking for hidden files and directorieschecking for portable file names ...checking for sufficient/correct file permissionschecking whether packageplyrangescan be installed ... [36s/37s] OK (36.6s)
✔  checking installed package size ...checking package directory ...
N  checking for future file timestamps ...
   unable to verify current timecheckingbuilddirectorychecking DESCRIPTION meta-information ...checking top-level fileschecking for left-over files ...checking index information (360ms)
✔  checking package subdirectories (1.2s)
✔  checking code files for non-ASCII characters ...checking R files for syntax errors ...checking whether the package can be loaded (10s)
✔  checking whether the package can be loaded with stated dependencies (10s)
✔  checking whether the package can be unloaded cleanly (10s)
✔  checking whether the namespace can be loaded with stated dependencies (9.7s)
✔  checking whether the namespace can be unloaded cleanly (9.8s)
─  checking loading without being on the library search path ... [11s/11s] OK (11.3s)
✔  checking dependencies in R code (19.5s)
✔  checking S3 generic/method consistency (10s)
✔  checking replacement functions (10.4s)
✔  checking foreign function calls (10.6s)
─  checking R code for possible problems ... [48s/49s] OK (48.8s)
N  checking Rd files (606ms)
   checkRd: (-1) group_by-ranges.Rd:57: Lost braces in \itemize; meant \describe ?
   checkRd: (-1) group_by-ranges.Rd:58: Lost braces in \itemize; meant \describe ?
   checkRd: (-1) ranges-anchor.Rd:58: Lost braces in \itemize; meant \describe ?
   checkRd: (-1) ranges-anchor.Rd:59: Lost braces in \itemize; meant \describe ?
   checkRd: (-1) ranges-anchor.Rd:60: Lost braces in \itemize; meant \describe ?
   checkRd: (-1) ranges-anchor.Rd:61-62: Lost braces in \itemize; meant \describe ?
   checkRd: (-1) ranges-anchor.Rd:63-64: Lost braces in \itemize; meant \describe ?
✔  checking Rd metadata ...checking Rd line widths ...checking Rd cross-references (392ms)
✔  checking for missing documentation entries (10.1s)
✔  checking for code/documentation mismatches (28.8s)
✔  checking Rd \usage sections (9.8s)
✔  checking Rd contents ...checking for unstated dependencies in examples (391ms)
✔  checking installed files frominst/doc’ (368ms)
✔  checking files invignettes’ (370ms)
─  checking examples ... [47s/48s] OK (48.2s)
✔  checking for unstated dependencies intests...checking tests (508ms)
    [42s/42s] OKthat.R* checking for unstated dependencies in vignettes ... OK
   * checking package vignettes ... OK
   * checking re-building of vignette outputs ... [18s/19s] OK
   * checking for non-standard things in the check directory ... OK
   * checking for detritus in the temp directory ... OK
   * DONE
   
   Status: 2 NOTEs
   See/Volumes/workSD1/Pierre-Paul/CRCT/PPA_work/project/plyranges.Rcheck/00check.logfor details.
   
── R CMD check results ─────────────────────────────────────────────────────────────────────────────── plyranges 1.23.2 ────
Duration: 6m 7.5schecking for future file timestamps ... NOTE
  unable to verify current timechecking Rd files ... NOTE
  checkRd: (-1) group_by-ranges.Rd:57: Lost braces in \itemize; meant \describe ?
  checkRd: (-1) group_by-ranges.Rd:58: Lost braces in \itemize; meant \describe ?
  checkRd: (-1) ranges-anchor.Rd:58: Lost braces in \itemize; meant \describe ?
  checkRd: (-1) ranges-anchor.Rd:59: Lost braces in \itemize; meant \describe ?
  checkRd: (-1) ranges-anchor.Rd:60: Lost braces in \itemize; meant \describe ?
  checkRd: (-1) ranges-anchor.Rd:61-62: Lost braces in \itemize; meant \describe ?
  checkRd: (-1) ranges-anchor.Rd:63-64: Lost braces in \itemize; meant \describe ?

0 errors| 0 warnings| 2 notesR CMD check succeeded

Let me know what you think!

@sa-lee sa-lee self-requested a review March 13, 2025 21:58
Copy link
Collaborator

@sa-lee sa-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and very compact @ppaxisa. Happy for this to be merged if you are @mikelove.

@mikelove
Copy link
Member

Thanks -- let me check it out this week. I'll aim to merge by Friday at latest

@mikelove mikelove merged commit e2d42a7 into tidyomics:devel Mar 18, 2025
1 check passed
@mikelove
Copy link
Member

Thanks @ppaxisa, a big speedup!

I'm thinking of modifying the message from

metadata contains S4 columns, using plyranges group operations.
You can speed up group operations by coercing metadata columns to S3

to something like:

metadata contains S4 columns, using lapply group operations.
removing S4 columns will speed up group operations.

Thoughts?

I'm not sure S3 is accurate here.

@ppaxisa
Copy link
Member Author

ppaxisa commented Mar 18, 2025

Yes S3 is not really accurate actually, I did not think too much about the message content tbh. The edited version looks good to me. Super happy that we get to merge :)

mikelove added a commit that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants