Skip to content

Conversation

@sa-lee
Copy link
Collaborator

@sa-lee sa-lee commented Jun 24, 2025

Adding pull method based on request on zulip: #tidiness_in_bioc > `pull` method

Currently default behaviour is to allow selection of all parts of a ranges like start, end etc. Happy to discuss whether that is appropriate or not.

@ppaxisa
Copy link
Member

ppaxisa commented Jun 24, 2025

looks good, although tbh overscope and tidyeval are a bit like black magic to me ^^. But parsing through this helps me understand it better. I think it's appropriate to be able to select any columns, regardless of whether it's a core or metadata column.

I've found the tiniest typo in the NEWS.md: # playranges 1.27.6; "playranges" instead of "plyranges".

Let's see what @mikelove thinks. He had mentioned the idea of relying on DFplyr methods, but I think this is not ripe yet in terms of adding a new dependency to plyranges and using DFplyr (we could discuss that separately from the pull implementation)

@mikelove
Copy link
Member

He had mentioned the idea of relying on DFplyr methods

No, I agree with you that we should avoid a dependency.

This looks great!

> x <- data.frame(seqnames=1, start=1:10, width=1, foo=letters[1:10]) |> as_granges()
> x
GRanges object with 10 ranges and 1 metadata column:
       seqnames    ranges strand |         foo
          <Rle> <IRanges>  <Rle> | <character>
   [1]        1         1      * |           a
   [2]        1         2      * |           b
   [3]        1         3      * |           c
...
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> x |> pull(foo)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

@sa-lee
Copy link
Collaborator Author

sa-lee commented Jun 25, 2025

This does make one other big change where I moved dplyr to Depends in the DESCRIPTION file, to follow along from @jonocarroll approach in DFplyr. This means plyranges no longer reexports any of these tidyverse functions, so you can use dplyr::mutate(gr, ...) instead of plyranges::mutate(gr, ...), but this may break some downstream dependencies, which we will need to look into a bit more. This is the way it probably should've always been done but understand it might cause a bit of inconvenience. Not sure if there's a better way of doing it.

@mikelove
Copy link
Member

So does this resolve the namespace conflicts then?

this may break some downstream dependencies

which are you thinking about here?

@sa-lee
Copy link
Collaborator Author

sa-lee commented Jun 25, 2025

Yes I think this does resolve the conflict issue, except in the case of between, n and n_distinct as they aren't generics.

any package that invokes one of the core dplyr verbs using plyranges:: or importFrom plyranges dplyr_verb will break as these are no longer exported, so for example nullranges will break with this merge but the fix is swapping plyranges:: with dplyr::

@mikelove
Copy link
Member

Ok that's fine with me. I can go update nullranges

@jonocarroll
Copy link

Just for clarity on the DFplyr approach; I explicitly import only the necessary generics from dplyr https://github.com/jonocarroll/DFplyr/blob/4e8ba1b294c4e21fd4e04ac37c94b712738fcb2e/R/DFplyr-package.R#L8-L11
then add my own methods for these which means users can just call the generic and it gets dispatched to the DFplyr method if the object is a DataFrame and falls back to dplyr's dispatching if it's something else. This is perhaps a little more conservative vs importing all of dplyr which may clobber other functions I'm not messing with, e.g. lag. One big conflict that does come to mind is rename for which I don't want to disturb the S4Vectors generic, but DFplyr offers the dplyr-like NSE syntax so I export a rename2 (though in retrospect, it should just be a method on the S4Vectors generic).

@sa-lee
Copy link
Collaborator Author

sa-lee commented Jun 25, 2025

Just for clarity on the DFplyr approach; I explicitly import only the necessary generics from dplyr https://github.com/jonocarroll/DFplyr/blob/4e8ba1b294c4e21fd4e04ac37c94b712738fcb2e/R/DFplyr-package.R#L8-L11 then add my own methods for these which means users can just call the generic and it gets dispatched to the DFplyr method if the object is a DataFrame and falls back to dplyr's dispatching if it's something else. This is perhaps a little more conservative vs importing all of dplyr which may clobber other functions I'm not messing with, e.g. lag. One big conflict that does come to mind is rename for which I don't want to disturb the S4Vectors generic, but DFplyr offers the dplyr-like NSE syntax so I export a rename2 (though in retrospect, it should just be a method on the S4Vectors generic).

I only import the generics I need as well but to clarify for my understanding - doesn't putting a package in Depends import and attach all of that package on load? I have seen some of the other tidyverse packages do some onLoad trickery but that also seems hacky.

Regardless my initial approach of re-exporting was pretty clunky and this works for new.
So I will merge this PR. Thanks for your input everyone.

@sa-lee sa-lee merged commit ef55306 into tidyomics:devel Jun 25, 2025
1 check passed
exportClasses(FileOperator)
exportClasses(GroupedGenomicRanges)
exportClasses(GroupedIntegerRanges)
import(dplyr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have this line in DFplyr; I am a little fuzzy on exactly how DESCRIPTION Depends translates to NAMESPACE, but I don't think it necessarily does at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that shouldn't be there, I can't remember adding an # @import dplyr tag in in the documentation but sure enough it is there. I will make a hotfix. Thanks for noticing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#' @import dplyr
in case you can't find it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it came in with grouped speed-up but agree we want to be selective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, that was me. What's the best approach in general, using importFrom so that you only import the functions you need? Or using the <package>:: prefix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants