Skip to content

The parameter ngroup of stm() significantly modify the gamma matrix  #292

@tdelcey

Description

@tdelcey

Hi,

Ceteris paribus, adding the ngroup parameter to the stm function appears to modify the gamma matrix. From my understanding of the documentation, it could be normal and we should not expect that the model will converge to the same solution than a model without this parameter.

However, while the beta matrix and the global topic prevalence remain broadly similar between models, the gamma matrix is not only different but significantly so. It appears incorrect: the top documents associated with each topic do not seem to be related to the topic itself.

I tested this with my own data, and a quick check using the sample data from the stm package suggests a similar issue.

Below is a simple example:

library(stm)

docs <- stm::poliblog5k.docs
vocab <- stm::poliblog5k.voc
data <- stm::poliblog5k.meta 


stm_1 <- stm(documents = docs, vocab = vocab, K = 10, init.type = "Spectral", seed = 123) 
stm_2 <- stm(documents = docs, vocab = vocab, K = 10, init.type = "Spectral", seed = 123, ngroups = 2) 


plot(stm_1, type = "summary", n = 5)
plot(stm_2, type = "summary", n = 5)

findThoughts(stm_1, texts = data$text, topics = 6, n = 5)
findThoughts(stm_2, texts = data$text, topics = 6, n = 5)

gamma_1 <- tidytext::tidy(stm_1, matrix = "gamma") %>% filter(topic == 6) %>% arrange(desc(gamma))

gamma_2 <- tidytext::tidy(stm_2, matrix = "gamma") %>% filter(topic == 6) %>% arrange(desc(gamma))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions