|
5 | 5 |
|
6 | 6 |
|
7 | 7 | ## Features |
8 | | - - Novel transformer-based topic models: |
| 8 | + - Implementations of transformer-based topic models: |
9 | 9 | - Semantic Signal Separation - S³ 🧭 |
10 | 10 | - KeyNMF 🔑 |
11 | | - - GMM :gem: (paper soon) |
12 | | - - Implementations of other transformer-based topic models |
| 11 | + - GMM :gem: |
13 | 12 | - Clustering Topic Models: BERTopic and Top2Vec |
14 | 13 | - Autoencoding Topic Models: CombinedTM and ZeroShotTM |
15 | 14 | - FASTopic |
| 15 | + - Dynamic, Online and Hierarchical Topic Modeling |
16 | 16 | - Streamlined scikit-learn compatible API 🛠️ |
17 | 17 | - Easy topic interpretation 🔍 |
18 | | - - Dynamic Topic Modeling 📈 (GMM, ClusteringTopicModel and KeyNMF) |
| 18 | + - Automated topic naming with LLMs |
19 | 19 | - Visualization with [topicwizard](https://github.com/x-tabdeveloping/topicwizard) 🖌️ |
20 | 20 |
|
21 | 21 | > This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues. |
22 | 22 |
|
23 | | -### New in version 0.7.0 |
| 23 | +### New in version 0.8.0 |
24 | 24 |
|
25 | | -#### Component re-estimation, refitting and topic merging |
| 25 | +#### Automated Topic Naming |
26 | 26 |
|
27 | | -Some models can now easily be modified after being trained in an efficient manner, |
28 | | -without having to recompute all attributes from scratch. |
29 | | -This is especially significant for clustering models and $S^3$. |
| 27 | +Turftopic now allows you to automatically assign human readable names to topics using LLMs or n-gram retrieval! |
30 | 28 |
|
31 | 29 | ```python |
32 | | -from turftopic import SemanticSignalSeparation, ClusteringTopicModel |
33 | | - |
34 | | -s3_model = SemanticSignalSeparation(5, feature_importance="combined").fit(corpus) |
35 | | -# Re-estimating term importances |
36 | | -s3_model.estimate_components(feature_importance="angular") |
37 | | -# Refitting S^3 with a different number of topics (very fast) |
38 | | -s3_model.refit(n_components=10, random_seed=42) |
39 | | - |
40 | | -clustering_model = ClusteringTopicModel().fit(corpus) |
41 | | -# Reduces number of topics automatically with a given method |
42 | | -clustering_model.reduce_topics(n_reduce_to=20, reduction_method="smallest") |
43 | | -# Merge topics manually |
44 | | -clustering_model.join_topics([0,3,4,5]) |
45 | | -# Resets original topics |
46 | | -clustering_model.reset_topics() |
47 | | -# Re-estimates term importances based on a different method |
48 | | -clustering_model.estimate_components(feature_importance="centroid") |
49 | | -``` |
50 | | - |
51 | | -#### Manual topic naming |
52 | | - |
53 | | -You can now manually label topics in all models in Turftopic. |
54 | | - |
55 | | -```python |
56 | | -# you can specify a dict mapping IDs to names |
57 | | -model.rename_topics({0: "New name for topic 0", 5: "New name for topic 5"}) |
58 | | -# or a list of topic names |
59 | | -model.rename_topics([f"Topic {i}" for i in range(10)]) |
60 | | -``` |
61 | | - |
62 | | -#### Saving, loading and publishing to HF Hub |
63 | | - |
64 | | -You can now load, save and publish models with dedicated functionality. |
65 | | - |
66 | | -```python |
67 | | -from turftopic import load_model |
| 30 | +from turftopic import KeyNMF |
| 31 | +from turftopic.namers import OpenAITopicNamer |
68 | 32 |
|
69 | | -model.to_disk("out_folder/") |
70 | | -model = load_model("out_folder/") |
| 33 | +model = KeyNMF(10).fit(corpus) |
71 | 34 |
|
72 | | -model.push_to_hub("your_user/model_name") |
73 | | -model = load_model("your_user/model_name") |
| 35 | +namer = OpenAITopicNamer("gpt-4o-mini") |
| 36 | +model.rename_topics(namer) |
| 37 | +model.print_topics() |
74 | 38 | ``` |
75 | 39 |
|
| 40 | +| Topic ID | Topic Name | Highest Ranking | |
| 41 | +| - | - | - | |
| 42 | +| 0 | Operating Systems and Software | windows, dos, os, ms, microsoft, unix, nt, memory, program, apps | |
| 43 | +| 1 | Atheism and Belief Systems | atheism, atheist, atheists, belief, religion, religious, theists, beliefs, believe, faith | |
| 44 | +| 2 | Computer Architecture and Performance | motherboard, ram, memory, cpu, bios, isa, speed, 486, bus, performance | |
| 45 | +| 3 | Storage Technologies | disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot | |
| 46 | +| | ... | |
76 | 47 |
|
77 | 48 | ## Basics [(Documentation)](https://x-tabdeveloping.github.io/turftopic/) |
78 | 49 | [](https://colab.research.google.com/github/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb) |
|
0 commit comments