feat: add pull method, alongside tests and updated documentation #113

sa-lee · 2025-06-24T04:59:21Z

Adding pull method based on request on zulip: #tidiness_in_bioc > `pull` method

Currently default behaviour is to allow selection of all parts of a ranges like start, end etc. Happy to discuss whether that is appropriate or not.

ppaxisa · 2025-06-24T14:41:34Z

looks good, although tbh overscope and tidyeval are a bit like black magic to me ^^. But parsing through this helps me understand it better. I think it's appropriate to be able to select any columns, regardless of whether it's a core or metadata column.

I've found the tiniest typo in the NEWS.md: # playranges 1.27.6; "playranges" instead of "plyranges".

Let's see what @mikelove thinks. He had mentioned the idea of relying on DFplyr methods, but I think this is not ripe yet in terms of adding a new dependency to plyranges and using DFplyr (we could discuss that separately from the pull implementation)

mikelove · 2025-06-24T14:59:38Z

He had mentioned the idea of relying on DFplyr methods

No, I agree with you that we should avoid a dependency.

This looks great!

> x <- data.frame(seqnames=1, start=1:10, width=1, foo=letters[1:10]) |> as_granges()
> x
GRanges object with 10 ranges and 1 metadata column:
       seqnames    ranges strand |         foo
          <Rle> <IRanges>  <Rle> | <character>
   [1]        1         1      * |           a
   [2]        1         2      * |           b
   [3]        1         3      * |           c
...
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> x |> pull(foo)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

sa-lee · 2025-06-25T00:39:43Z

This does make one other big change where I moved dplyr to Depends in the DESCRIPTION file, to follow along from @jonocarroll approach in DFplyr. This means plyranges no longer reexports any of these tidyverse functions, so you can use dplyr::mutate(gr, ...) instead of plyranges::mutate(gr, ...), but this may break some downstream dependencies, which we will need to look into a bit more. This is the way it probably should've always been done but understand it might cause a bit of inconvenience. Not sure if there's a better way of doing it.

mikelove · 2025-06-25T04:37:31Z

So does this resolve the namespace conflicts then?

this may break some downstream dependencies

which are you thinking about here?

sa-lee · 2025-06-25T05:42:41Z

Yes I think this does resolve the conflict issue, except in the case of between, n and n_distinct as they aren't generics.

any package that invokes one of the core dplyr verbs using plyranges:: or importFrom plyranges dplyr_verb will break as these are no longer exported, so for example nullranges will break with this merge but the fix is swapping plyranges:: with dplyr::

mikelove · 2025-06-25T11:19:49Z

Ok that's fine with me. I can go update nullranges

jonocarroll · 2025-06-25T23:14:00Z

Just for clarity on the DFplyr approach; I explicitly import only the necessary generics from dplyr https://github.com/jonocarroll/DFplyr/blob/4e8ba1b294c4e21fd4e04ac37c94b712738fcb2e/R/DFplyr-package.R#L8-L11
then add my own methods for these which means users can just call the generic and it gets dispatched to the DFplyr method if the object is a DataFrame and falls back to dplyr's dispatching if it's something else. This is perhaps a little more conservative vs importing all of dplyr which may clobber other functions I'm not messing with, e.g. lag. One big conflict that does come to mind is rename for which I don't want to disturb the S4Vectors generic, but DFplyr offers the dplyr-like NSE syntax so I export a rename2 (though in retrospect, it should just be a method on the S4Vectors generic).

sa-lee · 2025-06-25T23:51:44Z

Just for clarity on the DFplyr approach; I explicitly import only the necessary generics from dplyr https://github.com/jonocarroll/DFplyr/blob/4e8ba1b294c4e21fd4e04ac37c94b712738fcb2e/R/DFplyr-package.R#L8-L11 then add my own methods for these which means users can just call the generic and it gets dispatched to the DFplyr method if the object is a DataFrame and falls back to dplyr's dispatching if it's something else. This is perhaps a little more conservative vs importing all of dplyr which may clobber other functions I'm not messing with, e.g. lag. One big conflict that does come to mind is rename for which I don't want to disturb the S4Vectors generic, but DFplyr offers the dplyr-like NSE syntax so I export a rename2 (though in retrospect, it should just be a method on the S4Vectors generic).

I only import the generics I need as well but to clarify for my understanding - doesn't putting a package in Depends import and attach all of that package on load? I have seen some of the other tidyverse packages do some onLoad trickery but that also seems hacky.

Regardless my initial approach of re-exporting was pretty clunky and this works for new.
So I will merge this PR. Thanks for your input everyone.

jonocarroll · 2025-06-25T23:55:11Z

NAMESPACE

 exportClasses(FileOperator)
 exportClasses(GroupedGenomicRanges)
 exportClasses(GroupedIntegerRanges)
+import(dplyr)


I don't have this line in DFplyr; I am a little fuzzy on exactly how DESCRIPTION Depends translates to NAMESPACE, but I don't think it necessarily does at all.

hmm that shouldn't be there, I can't remember adding an # @import dplyr tag in in the documentation but sure enough it is there. I will make a hotfix. Thanks for noticing.

plyranges/R/dplyr-mutate.R

Line 36 in ef55306

#' @import dplyr

in case you can't find it

It looks like it came in with grouped speed-up but agree we want to be selective.

my bad, that was me. What's the best approach in general, using importFrom so that you only import the functions you need? Or using the <package>:: prefix?

sa-lee added 5 commits June 24, 2025 14:55

feat: add pull method, alongside tests and updated documentation

e41037f

refactor: fix doc issues and R CMD CHECK notes

3622845

refactor: fix tests with moving dplyr to Depends

a60797b

fix: update imports

65a1462

fix: tests, remove dplyr warning

dc3e157

fix typo, update note about package dependency updates

154e5c3

sa-lee merged commit ef55306 into tidyomics:devel Jun 25, 2025
1 check passed

jonocarroll reviewed Jun 25, 2025

View reviewed changes

sa-lee mentioned this pull request Jun 26, 2025

update pkgdown yml and remove full dplyr import #114

Merged

feat: add pull method, alongside tests and updated documentation #113

feat: add pull method, alongside tests and updated documentation #113

Uh oh!

Conversation

sa-lee commented Jun 24, 2025

Uh oh!

ppaxisa commented Jun 24, 2025

Uh oh!

mikelove commented Jun 24, 2025

Uh oh!

sa-lee commented Jun 25, 2025

Uh oh!

mikelove commented Jun 25, 2025

Uh oh!

sa-lee commented Jun 25, 2025

Uh oh!

mikelove commented Jun 25, 2025

Uh oh!

jonocarroll commented Jun 25, 2025

Uh oh!

sa-lee commented Jun 25, 2025

Uh oh!

Uh oh!

jonocarroll Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

sa-lee Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

jonocarroll Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

mikelove Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

ppaxisa Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants