feature: Update WordEmbeddingModel class #62

pbadillatorrealba · 2025-07-22T15:12:45Z

This pull request introduces significant updates to the WordEmbeddingModel class and its associated tests, improving type safety, error handling, and documentation. The changes also enhance compatibility with different versions of gensim and streamline the codebase for better readability and maintainability.

Enhancements to `WordEmbeddingModel` class:

Type Safety and Error Handling:
- Added stricter type checks for wv, name, and vocab_prefix parameters in the constructor, with more descriptive error messages.
- Modified __getitem__ to raise KeyError for words not in the vocabulary and added type validation for the key parameter.
- Improved __contains__ and __len__ methods for checking word existence and vocabulary size.
Compatibility Updates:
- Introduced GENSIM_V4_OR_GREATER to handle differences in gensim versions, ensuring compatibility with both pre-4.0 and 4.0+ versions.
Type Aliases:
- Replaced Union[np.ndarray, None] with NDArray[np.float64] for better type hinting and consistency.

Updates to Unit Tests:

Improved Test Coverage:
- Added detailed docstrings to all test functions for clarity and consistency.
- Enhanced tests for initialization, equality, and operators (__eq__, __contains__, __getitem__, __repr__).
Batch Update Tests:
- Refactored test_update_embeddings to validate batch updates with detailed checks for input types, sizes, and errors.

Code Simplification:

Removed Redundant Code:
- Eliminated unnecessary attributes and methods, simplifying the initialization logic.
- Updated get_embeddings_from_set in wefe/preprocessing.py to handle missing embeddings more concisely.

Copilot

Pull Request Overview

This PR introduces significant improvements to the WordEmbeddingModel class, focusing on enhanced type safety, better error handling, improved gensim compatibility, and comprehensive test coverage. The changes modernize the codebase with better type hints and more robust validation while maintaining backward compatibility.

Key changes include:

Enhanced type safety with stricter parameter validation and modern type annotations using NDArray[np.float64]
Improved gensim version compatibility through the GENSIM_V4_OR_GREATER constant
Comprehensive batch update functionality with atomic operations and detailed validation

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
wefe/word_embedding_model.py	Major refactoring with enhanced type safety, improved error messages, new batch update method, and better gensim compatibility
wefe/preprocessing.py	Updated to handle the new KeyError-raising behavior of `__getitem__`
tests/test_word_embedding_model.py	Comprehensive test updates with detailed docstrings and extensive validation for new functionality

Comments suppressed due to low confidence (1)

wefe/word_embedding_model.py:205

This line appears to be misplaced in the word_embedding_model.py file but belongs to preprocessing.py based on the diff context. This could indicate a merge error or incorrect file placement.

            if self.vocab_prefix is not None:

wefe/word_embedding_model.py

feature: update word embedding model

0cd46ac

pbadillatorrealba requested a review from Copilot July 22, 2025 15:12

pbadillatorrealba self-assigned this Jul 22, 2025

Copilot AI reviewed Jul 22, 2025

View reviewed changes

wefe/word_embedding_model.py Outdated Show resolved Hide resolved

wefe/word_embedding_model.py Outdated Show resolved Hide resolved

wefe/word_embedding_model.py Outdated Show resolved Hide resolved

feature: add copilot suggestions

edd5206

pbadillatorrealba changed the title ~~feature: Update word embedding model module~~ feature: Update WordEmbeddingModel module Jul 22, 2025

pbadillatorrealba changed the title ~~feature: Update WordEmbeddingModel module~~ feature: Update WordEmbeddingModel module Jul 22, 2025

pbadillatorrealba changed the title ~~feature: Update WordEmbeddingModel module~~ feature: Update WordEmbeddingModel class Jul 22, 2025

pbadillatorrealba merged commit 01e521b into develop Jul 22, 2025
4 checks passed

pbadillatorrealba deleted the feature/update_word_embedding_model branch July 22, 2025 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: Update WordEmbeddingModel class #62

feature: Update WordEmbeddingModel class #62

Uh oh!

pbadillatorrealba commented Jul 22, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feature: Update WordEmbeddingModel class #62

feature: Update WordEmbeddingModel class #62

Uh oh!

Conversation

pbadillatorrealba commented Jul 22, 2025

Enhancements to WordEmbeddingModel class:

Updates to Unit Tests:

Code Simplification:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enhancements to `WordEmbeddingModel` class: