Checking on LDA implementation reference

Hello! I wanted to check about something I noticed in the core documentation, specifically referring to the model of LDA used by OCTIS. Right now, you have this listing in the documentation:

LDA (Blei et al. 2003) | https://radimrehurek.com/gensim/

As it turns out, gensim's [ldamodel](https://radimrehurek.com/gensim/models/ldamodel.html) does not implement a classic LDA inference algorithm, but instead uses [Online LDA](https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf). This approach both uses variational inference (which [anecdotally according to many topic modeling experts](https://x.com/yoavgo/status/1338120353260986369) gets worse results than Gibbs sampling) and approximates it in a streaming context (which makes it fast for million-document corpora but provides poor topic quality on smaller corpora). In short, this implementation isn't 

There aren't a lot of popular true implementations of LDA inference using VI ([Blei et al., 2003](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)) or Gibbs sampling ([Griffiths and Steyvers, 2004](https://www.pnas.org/doi/10.1073/pnas.0307752101)). But given this library is being used on small corpora in evaluations, it might be good to update the documentation here to reflect the right paper citation? Another option - since you already have tomotopy, it's also an approximation to my knowledge that uses a [distributed algorithm from 2009](https://jmlr.csail.mit.edu/papers/v10/newman09a.html), but my impression is it's closer to classic algorithms on smaller corpora for topic outcomes based on how it does distributed Gibbs sampling.

Thanks, and sorry if this is getting in the weeds - happy to provide more information if useful!

Xanda Schofield

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Checking on LDA implementation reference #129

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Checking on LDA implementation reference #129

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions