-
-
Notifications
You must be signed in to change notification settings - Fork 16
Description
When writing (psychology) scientific papers, great care must be taken in reporting the state of item-level missing data for each psychological questionnaire. For example, Parent (2013) writes:
I recommend that authors (a) state their tolerance level for missing data by scale or subscale (e.g., “We calculated means for all subscales on which participants gave at least 75% complete data”) and then (b) report the individual missingness rates by scale per data point (i.e., the number of missing values out of all data points on that scale for all participants) and the maximum by participant (e.g., “For Attachment Anxiety, a total of 4 missing data points out of 100 were observed, with no participant missing more than a single data point”).
In order to comply with this recommandation, I have developed the function nice_na()
, which nicely summarizes NA
values according to those guidelines. The function describes both absolute and percentage values of specified column lists and supports specifying scales through regex. Reprex:
library(rempsyc)
# If the questionnaire items start with the same name, e.g.,
set.seed(15)
fun <- function() {
c(sample(c(NA, 1:10), replace = TRUE), NA, NA, NA)
}
df <- data.frame(
ID = c("idz", NA),
open_1 = fun(), open_2 = fun(), open_3 = fun(),
extrovert_1 = fun(), extrovert_2 = fun(), extrovert_3 = fun(),
agreeable_1 = fun(), agreeable_2 = fun(), agreeable_3 = fun()
)
head(df, 3)
#> ID open_1 open_2 open_3 extrovert_1 extrovert_2 extrovert_3 agreeable_1
#> 1 idz 4 NA 1 5 6 1 7
#> 2 <NA> 9 4 3 1 10 NA 7
#> 3 idz 1 4 1 9 2 NA 8
#> agreeable_2 agreeable_3
#> 1 7 9
#> 2 7 2
#> 3 7 8
# One can list the scale names directly:
nice_na(df, scales = c("ID", "open", "extrovert", "agreeable"))
#> var items na cells na_percent na_max na_max_percent
#> 1 ID:ID 1 7 14 50.00 1 100
#> 2 open_1:open_3 3 11 42 26.19 3 100
#> 3 extrovert_1:extrovert_3 3 17 42 40.48 3 100
#> 4 agreeable_1:agreeable_3 3 10 42 23.81 3 100
#> 5 Total 10 45 140 32.14 10 100
#> all_na
#> 1 7
#> 2 3
#> 3 3
#> 4 3
#> 5 2
Created on 2023-09-02 with reprex v2.0.2
Would you like this function to migrate from rempsyc
to datawizard
?
For the name, I was thinking data_missing_items
or just data_missing
since it also works without scale items and it is similar to our other data_
functions like data_duplicated
. It could also be describe_missing
in line with describe_distribution
(actually that one makes more sense I think).