Description
Hello! I wanted to check about something I noticed in the core documentation, specifically referring to the model of LDA used by OCTIS. Right now, you have this listing in the documentation:
LDA (Blei et al. 2003) | https://radimrehurek.com/gensim/
As it turns out, gensim's ldamodel does not implement a classic LDA inference algorithm, but instead uses Online LDA. This approach both uses variational inference (which anecdotally according to many topic modeling experts gets worse results than Gibbs sampling) and approximates it in a streaming context (which makes it fast for million-document corpora but provides poor topic quality on smaller corpora). In short, this implementation isn't
There aren't a lot of popular true implementations of LDA inference using VI (Blei et al., 2003) or Gibbs sampling (Griffiths and Steyvers, 2004). But given this library is being used on small corpora in evaluations, it might be good to update the documentation here to reflect the right paper citation? Another option - since you already have tomotopy, it's also an approximation to my knowledge that uses a distributed algorithm from 2009, but my impression is it's closer to classic algorithms on smaller corpora for topic outcomes based on how it does distributed Gibbs sampling.
Thanks, and sorry if this is getting in the weeds - happy to provide more information if useful!
Xanda Schofield