Question: is it possible to remove words from a `fastText` model (`.bin` file)?

Hi there,

I'm testing the [pre-trained models from Meta](https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md#models), and it's nice to have the option of subword search for OOV words. However, [for French at least, data cleaning leaves to be desired](https://github.com/facebookresearch/fastText/issues/403), leading to many duplicates and badly tokenised words.

I was thinking of testing this nevertheless, but filtering out all the words that are not in a separate dictionary file (I have pretty comprehensive lists of words). I could do that after computing the similarity, but it would be neater to remove the words and vectors from the model instead, so that there is less computation waste, and I wouldn't need to implement checks for `topn` (to make sure I actually obtain the `n` neighbours at each request).

Is this something that can be done using `gensim` (I would have asked on `fastText` first, but the repo is read-only) by any chance?

Thanks for this!
Best,
Jeremie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question: is it possible to remove words from a `fastText` model (`.bin` file)? #3618

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Question: is it possible to remove words from a fastText model (.bin file)? #3618

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Question: is it possible to remove words from a `fastText` model (`.bin` file)? #3618