Skip to content

New function type_convert(df, schema) #286

@peterdesmet

Description

@peterdesmet

For the etn R package, it would be useful to have a function that takes a dataframe + schema, and casts the columns of that dataframe to the types defined in the schema, thereby guaranteeing that the types are the expected ones. This function likely has wider use, which is why it is suggested here.

It is similar to:

  • readr::type_convert() function. By default it guesses appropriate types for character columns, but col_types can also be provided.
  • frictionless::read_resource(), which takes a CSV + schema. We can reuse the internal cols() function.
library(tibble)
library(readr)
library(frictionless)

# df where last column is a character
(df <- tibble::tribble(
  ~txt,~txt_bool,
  "a","1",
  "b","0"
))
#> # A tibble: 2 × 2
#>   txt   txt_bool
#>   <chr> <chr>   
#> 1 a     1       
#> 2 b     0
schema <- create_schema(df)
schema$fields[[2]]$type <- "boolean"
schema
#> $fields
#> $fields[[1]]
#> $fields[[1]]$name
#> [1] "txt"
#> 
#> $fields[[1]]$type
#> [1] "string"
#> 
#> 
#> $fields[[2]]
#> $fields[[2]]$name
#> [1] "txt_bool"
#> 
#> $fields[[2]]$type
#> [1] "boolean"

# readr::type_convert() guesses dbl for last column
readr::type_convert(df)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   txt = col_character(),
#>   txt_bool = col_double()
#> )
#> # A tibble: 2 × 2
#>   txt   txt_bool
#>   <chr>    <dbl>
#> 1 a            1
#> 2 b            0

# readr::type_convert() with provided col_types uses correct lgl for last column
readr::type_convert(df, col_types = frictionless:::cols(schema))
#> # A tibble: 2 × 2
#>   txt   txt_bool
#>   <chr> <lgl>   
#> 1 a     TRUE    
#> 2 b     FALSE

Created on 2025-10-07 with reprex v2.1.1

Todo

  • Create function
  • Return error if df and schema have a mismatch in columns (this already exists in add_resource())
  • Return error if columns cannot be converted to suggested type (this might already be covered in readr::type_convert()
  • Use function in add_resource()

Metadata

Metadata

Assignees

No one assigned

    Labels

    function:add/removeFunctions add_resource(), remove_resource()

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions