-
Notifications
You must be signed in to change notification settings - Fork 18.5k
feat: add comprehensive unit tests for dataset service creation and retrieval methods #28547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add comprehensive unit tests for dataset service creation and retrieval methods #28547
Conversation
Add comprehensive test coverage for DocumentService methods: - save_document_with_dataset_id (create_document flows): * New document creation from upload_file * Missing data source validation * File not found error handling - get_document: single document retrieval by dataset_id and document_id * Successful retrieval * Document not found * Without document_id (returns None) - get_document_by_id: single document retrieval by id * Successful retrieval * Document not found - get_document_by_ids: bulk document retrieval * Successful bulk retrieval * Empty list handling - get_document_by_dataset_id: documents by dataset * Successful retrieval * Empty results - get_working_documents_by_dataset_id: completed, enabled, not archived documents * Successful retrieval * Empty results - get_error_documents_by_dataset_id: error/paused documents * Successful retrieval (error and paused statuses) * Empty results - get_batch_documents: documents by batch * Successful retrieval * Empty results - update_document_with_dataset_id: document updates * Successful update * Document not found * Document not available - delete_document: single document deletion * With upload_file data source * Without file_id * With None data_source_info - delete_documents: bulk document deletion * Successful deletion * Empty list handling * None list handling * Without doc_form - get_documents_position: position calculation * With existing documents * No documents (returns 1) Follows TDD principles with Arrange-Act-Assert structure and comprehensive coverage of all scenarios including edge cases.
Summary of ChangesHello @SmartDever02, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the test coverage for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive suite of unit tests for the DocumentService, covering creation, retrieval, update, and deletion methods. The tests are well-structured and cover a wide range of scenarios, including success paths and error conditions, which significantly improves the robustness of the service.
My review includes a few suggestions to further enhance the test quality:
- Using
create_autospecfor more robust mocks in the test data factory. - Improving exception testing to be more specific.
- Reducing code duplication for fixtures.
- Removing unused imports.
Overall, this is an excellent contribution that greatly enhances the test coverage of the application.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Fix failing tests in test_document_service.py: 1. delete_documents tests: - Fix mock_db_session.scalars to be a method call, not a property - Properly mock db.session.scalars() as a method that returns mock_scalars 2. update_document_not_found test: - Use pytest.raises(NotFound) directly instead of mocking NotFound - Import NotFound from werkzeug.exceptions 3. save_document_with_dataset_id tests: - Add missing mocks for ModelManager, DatasetCollectionBindingService - Add mocks for time.strftime and secrets.randbelow (used for batch generation) - Fix FileNotExistsError import and usage - Properly mock DocumentIndexingTaskProxy These fixes ensure all mocks match the actual implementation behavior.
Fix remaining issues in document service tests: 1. delete_documents tests: - Change mock_db_session.scalars assignment to use .return_value - This matches the pattern used in other tests 2. save_document_with_dataset_id_new_upload_file_success: - Add .delay method to mock_task_instance for DocumentIndexingTaskProxy - Add document.id assignment to mock document - Add assertion for mock_db_session.flush() call - Add assertion for mock_task_instance.delay() call These fixes ensure all mocks properly match the actual implementation behavior and all method calls are properly verified.
…tests Fix remaining test failures by ensuring db.session.scalars is properly mocked as a callable Mock in all tests that use it: 1. delete_documents tests: - Explicitly set mock_db_session.scalars = Mock(return_value=...) - This ensures scalars() is callable and returns the expected result 2. save_document_with_dataset_id_new_upload_file_success: - Add mock_db_session.scalars setup for duplicate check query - This query is used even when duplicate=False to check existing docs The issue was that mock_db_session.scalars might not have been properly initialized as a Mock, causing AttributeError when the code tries to call scalars(). By explicitly setting it, we ensure it's always a callable Mock.
Fix remaining test failures: 1. delete_documents tests: - Change mock_scalars_result.all.return_value to mock_scalars_result.all = Mock(return_value=...) - This ensures all() is a callable method, not just a property 2. save_document_with_dataset_id_file_not_found: - Add missing naive_utc_now mock - This is used in the code path before the FileNotExistsError is raised The key issue was that .all() needs to be a callable Mock method, not just a property with a return_value. This matches how SQLAlchemy's scalars().all() actually works.
Update test_save_document_with_dataset_id_new_upload_file_success to use the same pattern as delete_documents tests - making all() a callable Mock method instead of a property with return_value. This ensures consistency across all tests that mock db.session.scalars().all().
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive set of unit tests for the DocumentService, which significantly improves the test coverage and reliability of document-related operations. The tests are well-structured and cover a wide range of scenarios, including success paths and error conditions. I've provided a few suggestions to enhance the maintainability and assertions in the tests. Overall, this is a great contribution to the project's quality.
| # Assert | ||
| assert isinstance(result, Sequence) | ||
| assert len(result) == 3 | ||
| assert all(doc in result for doc in documents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using set() for comparison is more robust for unordered sequences and provides a more informative diff if the assertion fails. This also improves performance from O(n*m) to O(n+m). This applies to other similar assertions in this file as well.
| assert all(doc in result for doc in documents) | |
| assert set(result) == set(documents) |
| mock_scalars_result = Mock() | ||
| mock_scalars_result.all = Mock(return_value=documents) | ||
| mock_select = Mock() | ||
| mock_select.where.return_value = mock_select | ||
| # Mock scalars to return the mock_scalars_result | ||
| mock_db_session.scalars = Mock(return_value=mock_scalars_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock setup for db.session.scalars is a bit verbose and inconsistent with other tests in this file (e.g., test_get_document_by_ids_success). It can be simplified for better readability and consistency by chaining return_value attributes.
| mock_scalars_result = Mock() | |
| mock_scalars_result.all = Mock(return_value=documents) | |
| mock_select = Mock() | |
| mock_select.where.return_value = mock_select | |
| # Mock scalars to return the mock_scalars_result | |
| mock_db_session.scalars = Mock(return_value=mock_scalars_result) | |
| mock_select = Mock() | |
| mock_select.where.return_value = mock_select | |
| mock_db_session.scalars.return_value.all.return_value = documents |
| with patch("services.dataset_service.DocumentIndexingTaskProxy") as mock_task: | ||
| yield mock_task | ||
|
|
||
| def test_save_document_with_dataset_id_new_upload_file_success( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with ( | ||
| patch("services.dataset_service.DatasetService.check_doc_form") as mock_check_doc_form, | ||
| patch("services.dataset_service.FeatureService.get_features") as mock_features, | ||
| patch("services.dataset_service.DocumentService.get_documents_position") as mock_position, | ||
| patch("services.dataset_service.DocumentService.build_document") as mock_build, | ||
| patch("services.dataset_service.redis_client.lock") as mock_lock, | ||
| patch("services.dataset_service.select") as mock_select_func, | ||
| patch("services.dataset_service.DatasetProcessRule") as mock_process_rule, | ||
| patch("services.dataset_service.ModelManager") as mock_model_manager, | ||
| patch("services.dataset_service.DatasetCollectionBindingService") as mock_binding_service, | ||
| patch("services.dataset_service.DocumentIndexingTaskProxy") as mock_indexing_proxy, | ||
| patch("services.dataset_service.time.strftime") as mock_strftime, | ||
| patch("services.dataset_service.secrets.randbelow") as mock_randbelow, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has a large number of mocks configured within a single with block. To improve readability and maintainability, consider refactoring these patches into separate pytest fixtures. This will make the test body cleaner and the setup reusable. For example:
@pytest.fixture
def mock_check_doc_form(mocker):
return mocker.patch("services.dataset_service.DatasetService.check_doc_form")
@pytest.fixture
def mock_features(mocker):
mock = mocker.patch("services.dataset_service.FeatureService.get_features")
mock.return_value.billing.enabled = False
return mock
# in test method:
def test_save_document_with_dataset_id_new_upload_file_success(self, mock_check_doc_form, mock_features, ...):
# ... test logic ...
Description
This PR adds comprehensive unit test coverage for DocumentService methods, significantly improving test coverage for document creation, retrieval, update, and deletion functionality.
Test Coverage Added
save_document_with_dataset_idMethod Tests (3 test cases)Document Creation Flows:
✅ New document creation from upload_file
✅ Missing data source validation
✅ File not found error handling
get_documentMethod Tests (3 test cases)get_document_by_idMethod Tests (2 test cases)get_document_by_idsMethod Tests (2 test cases)get_document_by_dataset_idMethod Tests (2 test cases)get_working_documents_by_dataset_idMethod Tests (2 test cases)get_error_documents_by_dataset_idMethod Tests (2 test cases)get_batch_documentsMethod Tests (2 test cases)update_document_with_dataset_idMethod Tests (3 test cases)✅ Successful document update
✅ Document not found error
✅ Document not available error
delete_documentMethod Tests (3 test cases)✅ Successful deletion with upload_file data source
✅ Successful deletion without file_id
✅ Successful deletion with None data_source_info
delete_documentsMethod Tests (4 test cases)✅ Successful bulk deletion
✅ Empty list handling (returns early)
✅ None list handling (returns early)
✅ Deletion without doc_form (skips batch clean task)
get_documents_positionMethod Tests (2 test cases)Testing Approach
test_dataset_service_update_dataset.pytest_dataset_service_delete_dataset.pytest_dataset_service_retrieval.pyTest Statistics
test_document_service.pyRelated Work
This complements the existing test coverage:
test_dataset_models.py- Dataset model teststest_dataset_service_update_dataset.py- Dataset update operationstest_dataset_service_delete_dataset.py- Dataset delete operationstest_dataset_service_retrieval.py- Dataset retrieval operationstest_dataset_service_create_dataset.py- Dataset creation operationsChecklist
Contribution by Gittensor, learn more at https://gittensor.io/