-
Notifications
You must be signed in to change notification settings - Fork 133
Description
I need to calculate similarities of article titles and I intended to use Relaxed Word Mover's Distance.
I will use RelaxedWordMoversDistance() function of text2vec R package.
After some trial, in my output matix which is showing similarities of titles,
I see that RMWD values were not symmetrical.
As I was skeptical of the result I got using my own data, I also tested the example in the vignette.
I checked the example in the below adress.
https://search.r-project.org/CRAN/refmans/text2vec/html/00Index.html
I checked the example of RelaxedWordMoversDistance function in text2vec is an R package vignette.
Then modified example ode and create a larger rwms matrix as follows.
rwms = rwmd_model$sim2(dtm)
The diagonals of the matrix are 1.
But the elements that are symmetrical with respect to the diagonal are not equal to each other.
Say that i and j are titles.
RelaxedWordMoversDistance[i,j] is not equal to RelaxedWordMoversDistance[j,i]
Is this difference normal or am I doing something wrong?
If you can help I would be grateful.
Below is coppied from Vignette: "Package ‘text2vec’ November 30, 2022"
Example
Not run:
library(text2vec)
library(rsparse)
data("movie_review")
tokens = word_tokenizer(tolower(movie_review$review))
v = create_vocabulary(itoken(tokens))
v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.5)
it = itoken(tokens)
vectorizer = vocab_vectorizer(v)
similarities 29
dtm = create_dtm(it, vectorizer)
tcm = create_tcm(it, vectorizer, skip_grams_window = 5)
glove_model = GloVe$new(rank = 50, x_max = 10)
wv = glove_model$fit_transform(tcm, n_iter = 5)
wv = wv + t(glove_model$components)
rwmd_model = RelaxedWordMoversDistance$new(dtm, wv)
rwms = rwmd_model$sim2(dtm[1:10, ])
head(sort(rwms[1, ], decreasing = T))