Open
Description
Dear authors,
a student pointed out that the following uses of correlation()
and cor_test()
return different results.
set.seed(1337)
data.frame(
a = sample(1:5, 10, replace = T),
b = factor(sample(c("s", "m", "l"), 10, , replace = T),
levels = c("s", "m", "l"),
ordered = T)
) -> df
correlation::correlation(df, method = "spearman")
# # Correlation Matrix (spearman-method)
#
# Parameter1 | Parameter2 | rho | 95% CI | S | p
# -----------------------------------------------------------------
# a | b.s | -0.49 | [-0.86, 0.22] | 245.72 | 0.668
# a | b.m | -0.11 | [-0.70, 0.57] | 182.97 | 0.765
# a | b.l | 0.51 | [-0.20, 0.87] | 81.12 | 0.668
# b.s | b.m | -0.41 | [-0.83, 0.32] | 232.36 | 0.725
# b.s | b.l | -0.41 | [-0.83, 0.32] | 232.36 | 0.725
# b.m | b.l | -0.67 | [-0.92, -0.04] | 275.00 | 0.212
#
# p-value adjustment method: Holm (1979)
# Observations: 10
# Warning message:
# It seems like there is not enough continuous variables in your data. Maybe you want to include the factors?
# We're setting `include_factors=TRUE` for you.
correlation::cor_test(df, "a", "b", method = "spearman")
# Parameter1 | Parameter2 | rho | 95% CI | S | p
# --------------------------------------------------------------
# a | b | 0.59 | [-0.08, 0.89] | 67.64 | 0.073
#
# Observations: 10
I wondered whether this behavior is intentional?
If yes, which one would you consider the correct approach to compute rho between an ordered factor and a numeric variable?
Best,
Alexander