-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Description
Hi,
Ceteris paribus, adding the ngroup
parameter to the stm
function appears to modify the gamma matrix. From my understanding of the documentation, it could be normal and we should not expect that the model will converge to the same solution than a model without this parameter.
However, while the beta matrix and the global topic prevalence remain broadly similar between models, the gamma matrix is not only different but significantly so. It appears incorrect: the top documents associated with each topic do not seem to be related to the topic itself.
I tested this with my own data, and a quick check using the sample data from the stm package suggests a similar issue.
Below is a simple example:
library(stm)
docs <- stm::poliblog5k.docs
vocab <- stm::poliblog5k.voc
data <- stm::poliblog5k.meta
stm_1 <- stm(documents = docs, vocab = vocab, K = 10, init.type = "Spectral", seed = 123)
stm_2 <- stm(documents = docs, vocab = vocab, K = 10, init.type = "Spectral", seed = 123, ngroups = 2)
plot(stm_1, type = "summary", n = 5)
plot(stm_2, type = "summary", n = 5)
findThoughts(stm_1, texts = data$text, topics = 6, n = 5)
findThoughts(stm_2, texts = data$text, topics = 6, n = 5)
gamma_1 <- tidytext::tidy(stm_1, matrix = "gamma") %>% filter(topic == 6) %>% arrange(desc(gamma))
gamma_2 <- tidytext::tidy(stm_2, matrix = "gamma") %>% filter(topic == 6) %>% arrange(desc(gamma))
Metadata
Metadata
Assignees
Labels
No labels