-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic Distribution in Documents using BTM. #5
Comments
For getting the topic distribution within each short text, did you use the |
Thanks, the predict.BTM function did give the topic distribution across individual texts. |
Hi @jwijffels , I am using |
Hi, one option to measure topic quality are coherence metrics. Simply spoken, these metrics take the top x terms of a topic and check their statistical relation (different metrics) in the corpus to assess the quality of the set of terms. I have implemented some metrics in the text2vec package on the basis of a paper by Röder et al., but have no experience if they make sense for or work with biterm models. Probably for biterm models they perform worse due to the sparseness.
I would really like to implement the metrics for udpipe to support the nice work by jwiffels, but simply lack the time to do so at the moment.
Am 8. März 2023 20:46:34 MEZ schrieb mevalerio ***@***.***>:
…Hi @jwijffels , I am using ``BTM`` for a paper, thank you for your hard work it. I am thinking to use a entropy based measure to evaluate models when K changes. Anyway, I would like to assess it against “something” that pickups a word-based likelihood of belonging. I am not understanding how ``logLik.BTM`` can help. The more ``ll`` is close to zero (sum of sum(phi[term1, ] * phi[term2, ] * theta), the better the model? I know I am abusing terminologies, apologies in advance.
--
Reply to this email directly or view it on GitHub:
#5 (comment)
You are receiving this because you are subscribed to this thread.
Message ID: ***@***.***>
|
Hello jwijffels,
Thank you very much for creating the R implementation of BTM. I am using it for finding out topics in short texts (i.e. mainly tweets). I would like to know if we can identify the topic distribution within each short text, is this functionality available in the existing version of BTM?
In the original research paper by Yan et. al. : A Biterm Topic Model for Short Text under the Introduction section, mentions:
"However, we show that the topic distribution of each document can be naturally derived based on
the learned model".
Also, is there a way in which the number of topics can be identified through this package. This is not an issue but a possible feature request.
Thanks again for your inputs.
The text was updated successfully, but these errors were encountered: