faux-naïf (/ˌfoʊ.naɪˈif/): a person who pretends to be simple or innocent
fauxnaif: an R package for simplifying data by pretending values are
NA
fauxnaif provides an extension to dplyr::na_if()
. Unlike
dplyr’s na_if()
,
na_if_in()
allows you to specify multiple values to be replaced with
NA
using a single function. fauxnaif also includes a complementary
function na_if_not()
to specify values to keep.
You can install fauxnaif
from
CRAN:
install.packages("fauxanif")
Or the development version from GitHub:
# install.packages("remotes")
remotes::install_github("rossellhayes/fauxnaif")
library(dplyr)
library(fauxnaif)
Let’s say we want to remove an unwanted negative value from a vector of numbers
-1:10
#> [1] -1 0 1 2 3 4 5 6 7 8 9 10
We can replace -1…
… explicitly:
na_if_in(-1:10, -1)
#> [1] NA 0 1 2 3 4 5 6 7 8 9 10
… by specifying values to keep:
na_if_not(-1:10, 0:10)
#> [1] NA 0 1 2 3 4 5 6 7 8 9 10
… using a formula:
na_if_in(-1:10, ~ . < 0)
#> [1] NA 0 1 2 3 4 5 6 7 8 9 10
messy_string <- c("abc", "", "def", "NA", "ghi", 42, "jkl", "NULL", "mno")
We can replace unwanted values…
… one at a time:
na_if_in(messy_string, "")
#> [1] "abc" NA "def" "NA" "ghi" "42" "jkl" "NULL" "mno"
… or all at once:
na_if_in(messy_string, "", "NA", "NULL", 1:100)
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
na_if_in(messy_string, c("", "NA", "NULL", 1:100))
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
na_if_in(messy_string, list("", "NA", "NULL", 1:100))
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
… or using a clever formula:
grepl("[a-z]{3,}", messy_string)
#> [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
na_if_not(messy_string, ~ grepl("[a-z]{3,}", .))
#> [1] "abc" NA "def" NA "ghi" NA "jkl" NA "mno"
faux_census
#> # A tibble: 5 × 4
#> state age income gender
#> <chr> <dbl> <dbl> <chr>
#> 1 TX 57 9999999 Gender is a social construct
#> 2 Canada 49 149000 Male
#> 3 NY 557 90750 f
#> 4 LA 2 61000 Male
#> 5 TN 64 9999999 M
na_if_in() is particularly useful inside dplyr::mutate()
:
faux_census %>%
mutate(
income = na_if_in(income, 9999999),
age = na_if_in(age, ~ . < 18, ~ . > 120),
state = na_if_not(state, ~ grepl("^[A-Z]{2,}$", .)),
gender = na_if_in(gender, ~ nchar(.) > 20)
)
#> # A tibble: 5 × 4
#> state age income gender
#> <chr> <dbl> <dbl> <chr>
#> 1 TX 57 NA <NA>
#> 2 <NA> 49 149000 Male
#> 3 NY NA 90750 f
#> 4 LA NA 61000 Male
#> 5 TN 64 NA M
Or you can use dplyr::across()
on data frames:
faux_census %>%
mutate(
across(age, na_if_in, ~ . < 18, ~ . > 120),
across(state, na_if_not, ~ grepl("^[A-Z]{2,}$", .)),
across(where(is.character), na_if_in, ~ nchar(.) > 20),
across(everything(), na_if_in, 9999999)
)
#> # A tibble: 5 × 4
#> state age income gender
#> <chr> <dbl> <dbl> <chr>
#> 1 TX 57 NA <NA>
#> 2 <NA> 49 149000 Male
#> 3 NY NA 90750 f
#> 4 LA NA 61000 Male
#> 5 TN 64 NA M
Hex sticker fonts are Bodoni* by indestructible type* and Source Code Pro by Adobe.
Image adapted from icon made by Freepik from flaticon.com.
Please note that fauxnaif is released with a Contributor Code of Conduct.