Skip to content

feature: Update WordEmbeddingModel class #62

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 22, 2025

Conversation

pbadillatorrealba
Copy link
Member

This pull request introduces significant updates to the WordEmbeddingModel class and its associated tests, improving type safety, error handling, and documentation. The changes also enhance compatibility with different versions of gensim and streamline the codebase for better readability and maintainability.

Enhancements to WordEmbeddingModel class:

  • Type Safety and Error Handling:

    • Added stricter type checks for wv, name, and vocab_prefix parameters in the constructor, with more descriptive error messages.
    • Modified __getitem__ to raise KeyError for words not in the vocabulary and added type validation for the key parameter.
    • Improved __contains__ and __len__ methods for checking word existence and vocabulary size.
  • Compatibility Updates:

    • Introduced GENSIM_V4_OR_GREATER to handle differences in gensim versions, ensuring compatibility with both pre-4.0 and 4.0+ versions.
  • Type Aliases:

    • Replaced Union[np.ndarray, None] with NDArray[np.float64] for better type hinting and consistency.

Updates to Unit Tests:

  • Improved Test Coverage:

    • Added detailed docstrings to all test functions for clarity and consistency.
    • Enhanced tests for initialization, equality, and operators (__eq__, __contains__, __getitem__, __repr__).
  • Batch Update Tests:

    • Refactored test_update_embeddings to validate batch updates with detailed checks for input types, sizes, and errors.

Code Simplification:

  • Removed Redundant Code:
    • Eliminated unnecessary attributes and methods, simplifying the initialization logic.
    • Updated get_embeddings_from_set in wefe/preprocessing.py to handle missing embeddings more concisely.

@pbadillatorrealba pbadillatorrealba self-assigned this Jul 22, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces significant improvements to the WordEmbeddingModel class, focusing on enhanced type safety, better error handling, improved gensim compatibility, and comprehensive test coverage. The changes modernize the codebase with better type hints and more robust validation while maintaining backward compatibility.

Key changes include:

  • Enhanced type safety with stricter parameter validation and modern type annotations using NDArray[np.float64]
  • Improved gensim version compatibility through the GENSIM_V4_OR_GREATER constant
  • Comprehensive batch update functionality with atomic operations and detailed validation

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
wefe/word_embedding_model.py Major refactoring with enhanced type safety, improved error messages, new batch update method, and better gensim compatibility
wefe/preprocessing.py Updated to handle the new KeyError-raising behavior of __getitem__
tests/test_word_embedding_model.py Comprehensive test updates with detailed docstrings and extensive validation for new functionality
Comments suppressed due to low confidence (1)

wefe/word_embedding_model.py:205

  • This line appears to be misplaced in the word_embedding_model.py file but belongs to preprocessing.py based on the diff context. This could indicate a merge error or incorrect file placement.
            if self.vocab_prefix is not None:

@pbadillatorrealba pbadillatorrealba changed the title feature: Update word embedding model module feature: Update WordEmbeddingModel module Jul 22, 2025
@pbadillatorrealba pbadillatorrealba changed the title feature: Update WordEmbeddingModel module feature: Update WordEmbeddingModel module Jul 22, 2025
@pbadillatorrealba pbadillatorrealba changed the title feature: Update WordEmbeddingModel module feature: Update WordEmbeddingModel class Jul 22, 2025
@pbadillatorrealba pbadillatorrealba merged commit 01e521b into develop Jul 22, 2025
4 checks passed
@pbadillatorrealba pbadillatorrealba deleted the feature/update_word_embedding_model branch July 22, 2025 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant