Suggestion of new function: `describe_missing()`

When writing (psychology) scientific papers, great care must be taken in reporting the state of item-level missing data for each psychological questionnaire. For example, Parent (2013) writes:

> I recommend that authors (a) state their tolerance level for missing data by scale or subscale (e.g., “We calculated means for all subscales on which participants gave at least 75% complete data”) and then (b) report the individual missingness rates by scale per data point (i.e., the number of missing values out of all data points on that scale for all participants) and the maximum by participant (e.g., “For Attachment Anxiety, a total of 4 missing data points out of 100 were observed, with no participant missing more than a single data point”).

In order to comply with this recommandation, I have developed the function `nice_na()`, which nicely summarizes `NA` values according to those guidelines. The function describes both absolute and percentage values of specified column lists and supports specifying scales through regex. Reprex:

``` r
library(rempsyc)

# If the questionnaire items start with the same name, e.g.,
set.seed(15)
fun <- function() {
  c(sample(c(NA, 1:10), replace = TRUE), NA, NA, NA)
}
df <- data.frame(
  ID = c("idz", NA),
  open_1 = fun(), open_2 = fun(), open_3 = fun(),
  extrovert_1 = fun(), extrovert_2 = fun(), extrovert_3 = fun(),
  agreeable_1 = fun(), agreeable_2 = fun(), agreeable_3 = fun()
)

head(df, 3)
#>     ID open_1 open_2 open_3 extrovert_1 extrovert_2 extrovert_3 agreeable_1
#> 1  idz      4     NA      1           5           6           1           7
#> 2 <NA>      9      4      3           1          10          NA           7
#> 3  idz      1      4      1           9           2          NA           8
#>   agreeable_2 agreeable_3
#> 1           7           9
#> 2           7           2
#> 3           7           8

# One can list the scale names directly:
nice_na(df, scales = c("ID", "open", "extrovert", "agreeable"))
#>                       var items na cells na_percent na_max na_max_percent
#> 1                   ID:ID     1  7    14      50.00      1            100
#> 2           open_1:open_3     3 11    42      26.19      3            100
#> 3 extrovert_1:extrovert_3     3 17    42      40.48      3            100
#> 4 agreeable_1:agreeable_3     3 10    42      23.81      3            100
#> 5                   Total    10 45   140      32.14     10            100
#>   all_na
#> 1      7
#> 2      3
#> 3      3
#> 4      3
#> 5      2
```

<sup>Created on 2023-09-02 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

---

## Would you like this function to migrate from `rempsyc` to `datawizard`?

For the name, I was thinking `data_missing_items` or just `data_missing` since it also works without scale items and it is similar to our other `data_` functions like `data_duplicated`. It could also be `describe_missing` in line with `describe_distribution` (actually that one makes more sense I think).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Suggestion of new function: `describe_missing()` #454

Would you like this function to migrate from `rempsyc` to `datawizard`?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Suggestion of new function: describe_missing() #454

Description

Would you like this function to migrate from rempsyc to datawizard?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Suggestion of new function: `describe_missing()` #454

Would you like this function to migrate from `rempsyc` to `datawizard`?