A library for the LDA topic modelling algorithm in Python and C.
__ ____ _____ | | | \| _ | | |__| | | | |_____|____/|__|__|
The best way to use liblda
is through the command line:
./run.py --docs docs.txt --numT 40 --vocab vocab.txt --seed 3 --iter 400 --alpha 0.1 --beta 0.01 --save_probs --print_topics 10
where:
docs.txt
contains one document per line,vocab.txt
contains the vocabulary (one word per line)--save_probs
indicates that you want to output the probs phi and theta
Place the directory liblda
somewhere in your Python path.
We have implemented the Gibbs sampling approach which is fairly efficient when done in C. All the rest of the functionality is done in Python so it is very hackable.
numpy
(for arrays)scipy
(for weave)
The code base works, but is a bit of a mess right now. A rewrite has begun -- in cython.
Ivan Savov, first dot last at gmail