You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can add a custom vectorizer to a topic model upon initializing it,
91
-
thereby getting different behaviours. You can for instance use noun-phrases in your model instead of words by using NounPhraseCountVectorizer:
61
+
thereby getting different behaviours. You can for instance use noun-phrases in your model instead of words by using `NounPhraseCountVectorizer` or estimate parameters for lemmas by using `LemmaCountVectorizer`
92
62
93
-
```bash
94
-
pip install turftopic[spacy]
95
-
python -m spacy download "en_core_web_sm"
96
-
```
97
63
98
-
```python
99
-
from turftopic import KeyNMF
100
-
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer
64
+
=== "Noun Phrase Extraction"
101
65
102
-
model = KeyNMF(10, vectorizer=NounPhraseCountVectorizer())
103
-
```
66
+
```bash
67
+
pip install turftopic[spacy]
68
+
python -m spacy download "en_core_web_sm"
69
+
```
70
+
71
+
```python
72
+
from turftopic import KeyNMF
73
+
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer
74
+
75
+
model = KeyNMF(10, vectorizer=NounPhraseCountVectorizer("en_core_web_sm"))
76
+
model.fit(corpus)
77
+
model.print_topics()
78
+
```
79
+
80
+
| Topic ID | Highest Ranking |
81
+
| - | - |
82
+
| | ... |
83
+
| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism |
84
+
| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index |
85
+
| | ... |
86
+
87
+
88
+
=== "Lemma Extraction"
89
+
90
+
```bash
91
+
pip install turftopic[spacy]
92
+
python -m spacy download "en_core_web_sm"
93
+
```
94
+
95
+
```python
96
+
from turftopic import KeyNMF
97
+
from turftopic.vectorizers.spacy import LemmaCountVectorizer
98
+
99
+
model = KeyNMF(10, vectorizer=LemmaCountVectorizer("en_core_web_sm"))
0 commit comments