Skip to content

RelaxedWordMoversDistance resuts are not symmetrical #343

@oguzozbay

Description

@oguzozbay

I need to calculate similarities of article titles and I intended to use Relaxed Word Mover's Distance.
I will use RelaxedWordMoversDistance() function of text2vec R package.
After some trial, in my output matix which is showing similarities of titles,
I see that RMWD values were not symmetrical.

As I was skeptical of the result I got using my own data, I also tested the example in the vignette.
I checked the example in the below adress.
https://search.r-project.org/CRAN/refmans/text2vec/html/00Index.html

I checked the example of RelaxedWordMoversDistance function in text2vec is an R package vignette.
Then modified example ode and create a larger rwms matrix as follows.
rwms = rwmd_model$sim2(dtm)

The diagonals of the matrix are 1.
But the elements that are symmetrical with respect to the diagonal are not equal to each other.

Say that i and j are titles.
RelaxedWordMoversDistance[i,j] is not equal to  RelaxedWordMoversDistance[j,i]
Is this difference normal or am I doing something wrong?
If you can help I would be grateful.

Below is coppied from Vignette: "Package ‘text2vec’ November 30, 2022"
Example

Not run:

library(text2vec)
library(rsparse)
data("movie_review")
tokens = word_tokenizer(tolower(movie_review$review))
v = create_vocabulary(itoken(tokens))
v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.5)
it = itoken(tokens)
vectorizer = vocab_vectorizer(v)
similarities 29
dtm = create_dtm(it, vectorizer)
tcm = create_tcm(it, vectorizer, skip_grams_window = 5)
glove_model = GloVe$new(rank = 50, x_max = 10)
wv = glove_model$fit_transform(tcm, n_iter = 5)
wv = wv + t(glove_model$components)

rwmd_model = RelaxedWordMoversDistance$new(dtm, wv)
rwms = rwmd_model$sim2(dtm[1:10, ])
head(sort(rwms[1, ], decreasing = T))

End(Not run)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions