Skip to content

dplyr verbs remove attributes of classed columns #7787

@DanChaltiel

Description

@DanChaltiel

Hi,

This issue is based on #7630, which was closed.

The proposed solution was to use haven::labelled(), which removes the class instead of the attributes.
This might be OK for simple, atomic classes like glue, but unfortunately not for more complex ones.

For instance, if you have a difftime or a survival::Surv() column, you can lose a lot of functionality.
For a difftime, you end up with a bare numeric vector, with no trace of a unit.
For a Surv, you even get an error, as the underlying object is expanded by vec_data() into a matrix of higher dimension.
I can find more examples if needed.

This is a regression, as I did not experience this in previous versions of dplyr, although I could not tell which version of dplyr or vctrs introduced this.
Unfortunately, this breaks things all over my codebase, as labels are an essential feature for data reporting.

Is there another solution apart from the one from #7630? (I'm willing to dig into vctrs to implement a better labelled class if needed)
Otherwise, would it be possible to restore attributes after using vctrs inside dplyr verbs?

Thank you very much for considering this issue and for your work on this package.

Reprex

library(dplyr)
library(purrr)
library(haven)

test = tibble(
  c = Sys.Date()-Sys.Date(),
  d = survival::Surv(5:6, event=1:0)
)
test
#> # A tibble: 2 × 2
#>   c           d
#>   <drtn> <Surv>
#> 1 0 days     5 
#> 2 0 days     6+

str(test)
#> tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ c: 'difftime' num [1:2] 0 0
#>   ..- attr(*, "units")= chr "days"
#>  $ d: 'Surv' num [1:2, 1:2] 5  6+
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "time" "status"
#>   ..- attr(*, "type")= chr "right"

test$c = labelled(test$c, label="C")
test
#> # A tibble: 2 × 2
#>   c              d
#>   <dbl+lbl> <Surv>
#> 1 0             5 
#> 2 0             6+

str(test)
#> tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ c: dbl+lbl [1:2] 0, 0
#>    ..@ label: chr "C"
#>  $ d: 'Surv' num [1:2, 1:2] 5  6+
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "time" "status"
#>   ..- attr(*, "type")= chr "right"

Created on 2026-01-04 with reprex v2.1.1

Error for the survival object:

labelled(test$d, label="D")
#> <labelled<double>[4]>: D
#> [1] 5 6 1 0

test$d = labelled(test$d, label="D")
#> Error in `$<-`:
#> ! Assigned data `labelled(test$d, label = "D")` must be compatible with
#>   existing data.
#> ✖ Existing data has 2 rows.
#> ✖ Assigned data has 4 rows.
#> ℹ Only vectors of size 1 are recycled.
#> Caused by error in `vectbl_recycle_rhs_rows()`:
#> ! Can't recycle input of size 4 to size 2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions