@@ -16,10 +16,70 @@ lda2vec: Tools for interpreting natural language
16
16
.. image :: https://img.shields.io/twitter/follow/chrisemoody.svg?style=social
17
17
:target: https://twitter.com/intent/follow?screen_name=chrisemoody
18
18
19
+ The lda2vec model tries to mix the best parts of word2vec and LDA
20
+ into a single framework. word2vec captures powerful relationships
21
+ between words, but the resulting vectors are largely interpretable
22
+ and don't represent documents. LDA on the other hand is quite
23
+ interpretable by humans, but doesn't model local word relationships
24
+ like word2vec. We build a model that builds both word and document
25
+ topics, makes them interpreable, makes topics over clients, times,
26
+ and documents, and makes them supervised topics.
27
+
28
+
29
+ Resources
30
+ ---------
31
+ See this `Jupyter Notebook <http://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda.ipynb >`_
32
+ for an example of an end-to-end demonstration.
33
+
34
+ See this `presentation <http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec-57135994 >`_
35
+ for a presentation focused on the benefits of word2vec, LDA, and lda2vec.
36
+
37
+ See the `API reference docs <https://lda2vec.readthedocs.org/en/latest/ >`_
38
+
39
+
40
+ About
41
+ -----
42
+
43
+ .. image :: images/img00_word2vec.png
44
+
45
+ Word2vec tries to model word-to-word relationships.
46
+
47
+ .. image :: images/img01_lda.png
48
+
49
+ LDA models document-to-word relationships.
50
+
51
+ .. image :: images/img02_lda_topics.png
52
+
53
+ LDA yields topics over each document.
54
+
55
+ .. image :: images/img03_lda2vec_topics01.png
56
+
57
+ lda2vec yields topics not over just documents, but also regions.
58
+
59
+ .. image :: images/img04_lda2vec_topics02.png
60
+
61
+ lda2vec also yields topics over clients.
62
+
63
+ .. image :: images/img05_lda2vec_topics03_supervised.png
64
+
65
+ lda2vec the topics can be 'supervised' and forced to predict another target.
66
+
67
+ lda2vec also includes more contexts and features than LDA. LDA dictates that
68
+ words are generated by a document vector; but we might have all kinds of
69
+ 'side-information' that should influence our topics. For example, a single
70
+ client comment is about a particular item ID, written at a particular time
71
+ and in a particular region. In this case, lda2vec gives you topics over all
72
+ items (separating jeans from shirts, for example) times (winter versus summer)
73
+ regions (desert versus coastal) and clients (sporty vs professional attire).
74
+
75
+ Ultimately, the topics are interpreted using the excellent pyLDAvis library:
76
+
77
+ .. image :: images/img06_pyldavis.gif
78
+
79
+
19
80
Requirements
20
81
------------
21
82
22
-
23
83
Minimum requirements:
24
84
25
85
- Python 2.7+
0 commit comments