|
20 | 20 |
|
21 | 21 | > This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues. |
22 | 22 |
|
23 | | -### New in version 0.6.0 |
| 23 | +### New in version 0.7.0 |
24 | 24 |
|
25 | | -#### Prompting Embedding Models |
| 25 | +#### Component re-estimation, refitting and topic merging |
26 | 26 |
|
27 | | -KeyNMF and clustering topic models can now efficiently utilise asymmetric and instruction-finetuned embedding models. |
28 | | -This, in combination with the right embedding model, can enhance performance significantly. |
| 27 | +Some models can now easily be modified after being trained in an efficient manner, |
| 28 | +without having to recompute all attributes from scratch. |
| 29 | +This is especially significant for clustering models and $S^3$. |
29 | 30 |
|
30 | 31 | ```python |
31 | | -from turftopic import KeyNMF |
32 | | -from sentence_transformers import SentenceTransformer |
33 | | - |
34 | | -encoder = SentenceTransformer( |
35 | | - "intfloat/multilingual-e5-large-instruct", |
36 | | - prompts={ |
37 | | - "query": "Instruct: Retrieve relevant keywords from the given document. Query: " |
38 | | - "passage": "Passage: " |
39 | | - }, |
40 | | - # Make sure to set default prompt to query! |
41 | | - default_prompt_name="query", |
42 | | -) |
43 | | -model = KeyNMF(10, encoder=encoder) |
| 32 | +from turftopic import SemanticSignalSeparation, ClusteringTopicModel |
| 33 | + |
| 34 | +s3_model = SemanticSignalSeparation(5, feature_importance="combined").fit(corpus) |
| 35 | +# Re-estimating term importances |
| 36 | +s3_model.estimate_components(feature_importance="angular") |
| 37 | +# Refitting S^3 with a different number of topics (very fast) |
| 38 | +s3_model.refit(n_components=10, random_seed=42) |
| 39 | + |
| 40 | +clustering_model = ClusteringTopicModel().fit(corpus) |
| 41 | +# Reduces number of topics automatically with a given method |
| 42 | +clustering_model.reduce_topics(n_reduce_to=20, reduction_method="smallest") |
| 43 | +# Merge topics manually |
| 44 | +clustering_model.join_topics([0,3,4,5]) |
| 45 | +# Resets original topics |
| 46 | +clustering_model.reset_topics() |
| 47 | +# Re-estimates term importances based on a different method |
| 48 | +clustering_model.estimate_components(feature_importance="centroid") |
| 49 | +``` |
| 50 | + |
| 51 | +#### Manual topic naming |
| 52 | + |
| 53 | +You can now manually label topics in all models in Turftopic. |
| 54 | + |
| 55 | +```python |
| 56 | +# you can specify a dict mapping IDs to names |
| 57 | +model.rename_topics({0: "New name for topic 0", 5: "New name for topic 5"}) |
| 58 | +# or a list of topic names |
| 59 | +model.rename_topics([f"Topic {i}" for i in range(10)]) |
| 60 | +``` |
| 61 | + |
| 62 | +#### Saving, loading and publishing to HF Hub |
| 63 | + |
| 64 | +You can now load, save and publish models with dedicated functionality. |
| 65 | + |
| 66 | +```python |
| 67 | +from turftopic import load_model |
| 68 | + |
| 69 | +model.to_disk("out_folder/") |
| 70 | +model = load_model("out_folder/") |
| 71 | + |
| 72 | +model.push_to_hub("your_user/model_name") |
| 73 | +model = load_model("your_user/model_name") |
44 | 74 | ``` |
45 | 75 |
|
46 | 76 |
|
|
0 commit comments