feature: Update WordEmbeddingModel class #62
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant updates to the
WordEmbeddingModel
class and its associated tests, improving type safety, error handling, and documentation. The changes also enhance compatibility with different versions ofgensim
and streamline the codebase for better readability and maintainability.Enhancements to
WordEmbeddingModel
class:Type Safety and Error Handling:
wv
,name
, andvocab_prefix
parameters in the constructor, with more descriptive error messages.__getitem__
to raiseKeyError
for words not in the vocabulary and added type validation for thekey
parameter.__contains__
and__len__
methods for checking word existence and vocabulary size.Compatibility Updates:
GENSIM_V4_OR_GREATER
to handle differences ingensim
versions, ensuring compatibility with both pre-4.0 and 4.0+ versions.Type Aliases:
Union[np.ndarray, None]
withNDArray[np.float64]
for better type hinting and consistency.Updates to Unit Tests:
Improved Test Coverage:
__eq__
,__contains__
,__getitem__
,__repr__
).Batch Update Tests:
test_update_embeddings
to validate batch updates with detailed checks for input types, sizes, and errors.Code Simplification:
get_embeddings_from_set
inwefe/preprocessing.py
to handle missing embeddings more concisely.