Skip to content

Commit 0bc8a0e

Browse files
Updated readme
1 parent 1d02b56 commit 0bc8a0e

File tree

1 file changed

+47
-17
lines changed

1 file changed

+47
-17
lines changed

README.md

Lines changed: 47 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,27 +20,57 @@
2020

2121
> This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues.
2222
23-
### New in version 0.6.0
23+
### New in version 0.7.0
2424

25-
#### Prompting Embedding Models
25+
#### Component re-estimation, refitting and topic merging
2626

27-
KeyNMF and clustering topic models can now efficiently utilise asymmetric and instruction-finetuned embedding models.
28-
This, in combination with the right embedding model, can enhance performance significantly.
27+
Some models can now easily be modified after being trained in an efficient manner,
28+
without having to recompute all attributes from scratch.
29+
This is especially significant for clustering models and $S^3$.
2930

3031
```python
31-
from turftopic import KeyNMF
32-
from sentence_transformers import SentenceTransformer
33-
34-
encoder = SentenceTransformer(
35-
"intfloat/multilingual-e5-large-instruct",
36-
prompts={
37-
"query": "Instruct: Retrieve relevant keywords from the given document. Query: "
38-
"passage": "Passage: "
39-
},
40-
# Make sure to set default prompt to query!
41-
default_prompt_name="query",
42-
)
43-
model = KeyNMF(10, encoder=encoder)
32+
from turftopic import SemanticSignalSeparation, ClusteringTopicModel
33+
34+
s3_model = SemanticSignalSeparation(5, feature_importance="combined").fit(corpus)
35+
# Re-estimating term importances
36+
s3_model.estimate_components(feature_importance="angular")
37+
# Refitting S^3 with a different number of topics (very fast)
38+
s3_model.refit(n_components=10, random_seed=42)
39+
40+
clustering_model = ClusteringTopicModel().fit(corpus)
41+
# Reduces number of topics automatically with a given method
42+
clustering_model.reduce_topics(n_reduce_to=20, reduction_method="smallest")
43+
# Merge topics manually
44+
clustering_model.join_topics([0,3,4,5])
45+
# Resets original topics
46+
clustering_model.reset_topics()
47+
# Re-estimates term importances based on a different method
48+
clustering_model.estimate_components(feature_importance="centroid")
49+
```
50+
51+
#### Manual topic naming
52+
53+
You can now manually label topics in all models in Turftopic.
54+
55+
```python
56+
# you can specify a dict mapping IDs to names
57+
model.rename_topics({0: "New name for topic 0", 5: "New name for topic 5"})
58+
# or a list of topic names
59+
model.rename_topics([f"Topic {i}" for i in range(10)])
60+
```
61+
62+
#### Saving, loading and publishing to HF Hub
63+
64+
You can now load, save and publish models with dedicated functionality.
65+
66+
```python
67+
from turftopic import load_model
68+
69+
model.to_disk("out_folder/")
70+
model = load_model("out_folder/")
71+
72+
model.push_to_hub("your_user/model_name")
73+
model = load_model("your_user/model_name")
4474
```
4575

4676

0 commit comments

Comments
 (0)