Skip to content

Conversation

s2terminal
Copy link

Fixing one of the issues in #3224

If I call most_similar() before doing add_vectors() and then call most_similar() again after doing add_vectors(), I get a ValueError: operands could not be broadcast together with shapes.
This error occurs because len(vectors) and len(vectors.norms) do not match.

from gensim.models import Word2Vec
import numpy

model = Word2Vec(sentences=[
                            ["this", "is", "test1"],
                            ["that", "is", "test2"],
], vector_size=2, min_count=1)

print(model.wv.most_similar("test1", topn=1)) #=> [('test2', 0.9941185712814331)]

model.wv.add_vectors(["test3"], [numpy.array([0.5, 0.5])])

print(model.wv.most_similar("test1", topn=1)) #=> ValueError: operands could not be broadcast together with shapes (6,) (5,) 

To resolve this error, I have used fill_norms to match len(vectors) and len(vectors.norms).

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 23, 2023

Needs a test. The example from the issue description is probably good enough.

@mpenkov mpenkov added this to the Spring 2024 release milestone Apr 8, 2024
@mpenkov mpenkov removed this from the Summer 2024 release milestone Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants