Skip to content

Performance linter: all(!is.na()) -> !anyNA() #2874

@etiennebacher

Description

@etiennebacher

all(!is.na(x)) requires checking all elements of x, but one could use !anyNA(x) instead for better performance (maybe harder to read, cf https://lintr.r-lib.org/dev/reference/outer_negation_linter.html, edit: my bad, got confused).

Some benchmarks on integer and character vectors:

n <- 1e7

##### Integers

no_na <- sample(1:100, n, replace = TRUE)
some_na <- sample(c(1:100, NA), n, replace = TRUE)
only_na <- rep(NA, n)

bench::mark(
  all(!is.na(no_na)),
  !anyNA(no_na)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(!is.na(no_na))  38.96ms   40.7ms      23.6    76.3MB     23.6
#> 2 !anyNA(no_na)        2.72ms    2.8ms     346.         0B      0

bench::mark(
  all(!is.na(some_na)),
  !anyNA(some_na)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                min   median  `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>      <dbl> <bch:byt>    <dbl>
#> 1 all(!is.na(some_na))   30.3ms   33.1ms       28.6    76.3MB     28.6
#> 2 !anyNA(some_na)             0    100ns 14724688.         0B      0

bench::mark(
  all(!is.na(only_na)),
  !anyNA(only_na)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(!is.na(only_na))   31.2ms     32ms      29.8    76.3MB     29.8
#> 2 !anyNA(only_na)             0    100ns 7926856.         0B      0


##### Strings

no_na <- sample(letters, n, replace = TRUE)
some_na <- sample(c(letters, NA), n, replace = TRUE)
only_na <- rep(NA, n)

bench::mark(
  all(!is.na(no_na)),
  !anyNA(no_na)
)
#> # A tibble: 2 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(!is.na(no_na))  33.18ms   34.1ms      29.0    76.3MB     20.7
#> 2 !anyNA(no_na)        4.17ms   5.16ms     182.         0B      0

bench::mark(
  all(!is.na(some_na)),
  !anyNA(some_na)
)
#> # A tibble: 2 × 6
#>   expression                min   median  `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>      <dbl> <bch:byt>    <dbl>
#> 1 all(!is.na(some_na))   25.4ms   26.9ms       37.0    76.3MB     22.2
#> 2 !anyNA(some_na)             0      1ns 17097081.         0B      0

bench::mark(
  all(!is.na(only_na)),
  !anyNA(only_na)
)
#> # A tibble: 2 × 6
#>   expression                min   median  `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>           <bch:tm> <bch:tm>      <dbl> <bch:byt>    <dbl>
#> 1 all(!is.na(only_na))   24.4ms   30.2ms       33.3    76.3MB     18.5
#> 2 !anyNA(only_na)             0    100ns 10787728.         0B      0

Those are also equivalent on length-0 input:

all(!is.na(character()))
#> [1] TRUE
!anyNA(character())
#> [1] TRUE

Current behaviour of lintr:

lintr::lint("all(!is.na(x))\n")
#> ℹ No lints found.

This has more than 10k matches on Github (although I rarely use the code search feature, so I don't know how big that is): https://github.com/search?q=language%3AR+%22all%28%21is.na%28%22&type=code

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions