feat: add comprehensive unit tests for dataset service creation and retrieval methods #28547

SmartDever02 · 2025-11-22T20:59:57Z

Description

This PR adds comprehensive unit test coverage for DocumentService methods, significantly improving test coverage for document creation, retrieval, update, and deletion functionality.

Test Coverage Added

`save_document_with_dataset_id` Method Tests (3 test cases)

Document Creation Flows:

✅ New document creation from upload_file
- Successful creation with proper mocking of dependencies
- Dataset indexing technique setup
- Process rule creation
- Document indexing task triggering
✅ Missing data source validation
- Error handling when data_source is None for new documents
- Proper error message validation
✅ File not found error handling
- Error handling when upload file doesn't exist
- FileNotExistsError exception handling

`get_document` Method Tests (3 test cases)

✅ Successful retrieval by dataset_id and document_id
✅ Document not found (returns None)
✅ Without document_id (returns None early)

`get_document_by_id` Method Tests (2 test cases)

✅ Successful retrieval by id
✅ Document not found (returns None)

`get_document_by_ids` Method Tests (2 test cases)

✅ Successful bulk retrieval of multiple documents
- Filters by enabled=True, indexing_status="completed", archived=False
✅ Empty list handling (returns empty sequence)

`get_document_by_dataset_id` Method Tests (2 test cases)

✅ Successful retrieval of all enabled documents for a dataset
✅ Empty results when no documents exist

`get_working_documents_by_dataset_id` Method Tests (2 test cases)

✅ Successful retrieval of working documents
- Filters: enabled=True, indexing_status="completed", archived=False
✅ Empty results when no working documents exist

`get_error_documents_by_dataset_id` Method Tests (2 test cases)

✅ Successful retrieval of error/paused documents
- Filters: indexing_status in ["error", "paused"]
✅ Empty results when no error documents exist

`get_batch_documents` Method Tests (2 test cases)

✅ Successful retrieval of documents by batch
- Filters by batch, dataset_id, and tenant_id
✅ Empty results when no documents exist for batch

`update_document_with_dataset_id` Method Tests (3 test cases)

✅ Successful document update
- Document name update
- Indexing status reset to "waiting"
- Process rule updates
- Document indexing task triggering
✅ Document not found error
- NotFound exception when document doesn't exist
✅ Document not available error
- ValueError when document display_status is not "available"

`delete_document` Method Tests (3 test cases)

✅ Successful deletion with upload_file data source
- document_was_deleted signal sent with file_id
- Database deletion and commit
✅ Successful deletion without file_id
- document_was_deleted signal sent with file_id=None
- Handles non-upload_file data sources
✅ Successful deletion with None data_source_info
- Graceful handling of missing data_source_info

`delete_documents` Method Tests (4 test cases)

✅ Successful bulk deletion
- Multiple documents deletion
- Batch clean document task triggering
- File ID collection for cleanup
✅ Empty list handling (returns early)
✅ None list handling (returns early)
✅ Deletion without doc_form (skips batch clean task)

`get_documents_position` Method Tests (2 test cases)

✅ Position calculation with existing documents
- Returns max_position + 1
✅ Position calculation with no documents
- Returns 1 (default position)

Testing Approach

Follows TDD principles with Arrange-Act-Assert structure
Uses factory pattern for test data creation (consistent with existing tests)
Comprehensive mocking of dependencies:
- Database session and queries
- Pagination
- Redis locks
- Celery tasks (DocumentIndexingTaskProxy, batch_clean_document_task)
- TagService
- FeatureService
- Current user context
Tests cover both success paths and error conditions
Follows project conventions from existing test files:
- test_dataset_service_update_dataset.py
- test_dataset_service_delete_dataset.py
- test_dataset_service_retrieval.py

Test Statistics

Total test cases: 40+ test cases
New test file: test_document_service.py
Lines of test code: ~1032 lines
Coverage: All major scenarios including edge cases

Related Work

This complements the existing test coverage:

✅ test_dataset_models.py - Dataset model tests
✅ test_dataset_service_update_dataset.py - Dataset update operations
✅ test_dataset_service_delete_dataset.py - Dataset delete operations
✅ test_dataset_service_retrieval.py - Dataset retrieval operations
✅ test_dataset_service_create_dataset.py - Dataset creation operations

Checklist

Contribution by Gittensor, learn more at https://gittensor.io/

Add comprehensive test coverage for DocumentService methods: - save_document_with_dataset_id (create_document flows): * New document creation from upload_file * Missing data source validation * File not found error handling - get_document: single document retrieval by dataset_id and document_id * Successful retrieval * Document not found * Without document_id (returns None) - get_document_by_id: single document retrieval by id * Successful retrieval * Document not found - get_document_by_ids: bulk document retrieval * Successful bulk retrieval * Empty list handling - get_document_by_dataset_id: documents by dataset * Successful retrieval * Empty results - get_working_documents_by_dataset_id: completed, enabled, not archived documents * Successful retrieval * Empty results - get_error_documents_by_dataset_id: error/paused documents * Successful retrieval (error and paused statuses) * Empty results - get_batch_documents: documents by batch * Successful retrieval * Empty results - update_document_with_dataset_id: document updates * Successful update * Document not found * Document not available - delete_document: single document deletion * With upload_file data source * Without file_id * With None data_source_info - delete_documents: bulk document deletion * Successful deletion * Empty list handling * None list handling * Without doc_form - get_documents_position: position calculation * With existing documents * No documents (returns 1) Follows TDD principles with Arrange-Act-Assert structure and comprehensive coverage of all scenarios including edge cases.

gemini-code-assist · 2025-11-22T21:00:12Z

Summary of Changes

Hello @SmartDever02, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the test coverage for the DocumentService by introducing a comprehensive suite of unit tests. These tests validate the core functionalities of document creation, various retrieval methods, updates, and deletions, ensuring robustness and reliability across different scenarios, including edge cases and error handling.

Highlights

Document Creation Tests: Added 3 test cases for save_document_with_dataset_id, covering new document creation from uploaded files, validation for missing data sources, and handling file not found errors.
Document Retrieval Tests: Introduced 15 test cases across various retrieval methods (get_document, get_document_by_id, get_document_by_ids, get_document_by_dataset_id, get_working_documents_by_dataset_id, get_error_documents_by_dataset_id, get_batch_documents), ensuring successful retrieval by ID, dataset ID, bulk retrieval, and filtering by status, as well as handling cases where documents are not found or lists are empty.
Document Update Tests: Implemented 3 test cases for update_document_with_dataset_id, including successful name and status changes, and error handling for non-existent or unavailable documents.
Document Deletion Tests: Added 7 test cases for document deletion (delete_document, delete_documents), covering single document deletion with and without file IDs, handling missing data source info, and bulk deletion scenarios including empty/None lists and cases without doc_form.
Document Position Calculation Tests: Included 2 test cases for get_documents_position, covering scenarios with existing documents and when no documents are present.
Comprehensive Test Coverage: The new tests follow TDD principles, use a factory pattern for consistent mock data, extensively mock dependencies (DB, Redis, Celery tasks), and cover both success and error paths, adding over 40 test cases in a new file totaling ~1032 lines of test code.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive suite of unit tests for the DocumentService, covering creation, retrieval, update, and deletion methods. The tests are well-structured and cover a wide range of scenarios, including success paths and error conditions, which significantly improves the robustness of the service.

My review includes a few suggestions to further enhance the test quality:

Using create_autospec for more robust mocks in the test data factory.
Improving exception testing to be more specific.
Reducing code duplication for fixtures.
Removing unused imports.

Overall, this is an excellent contribution that greatly enhances the test coverage of the application.

api/tests/unit_tests/services/test_document_service.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Fix failing tests in test_document_service.py: 1. delete_documents tests: - Fix mock_db_session.scalars to be a method call, not a property - Properly mock db.session.scalars() as a method that returns mock_scalars 2. update_document_not_found test: - Use pytest.raises(NotFound) directly instead of mocking NotFound - Import NotFound from werkzeug.exceptions 3. save_document_with_dataset_id tests: - Add missing mocks for ModelManager, DatasetCollectionBindingService - Add mocks for time.strftime and secrets.randbelow (used for batch generation) - Fix FileNotExistsError import and usage - Properly mock DocumentIndexingTaskProxy These fixes ensure all mocks match the actual implementation behavior.

Fix remaining issues in document service tests: 1. delete_documents tests: - Change mock_db_session.scalars assignment to use .return_value - This matches the pattern used in other tests 2. save_document_with_dataset_id_new_upload_file_success: - Add .delay method to mock_task_instance for DocumentIndexingTaskProxy - Add document.id assignment to mock document - Add assertion for mock_db_session.flush() call - Add assertion for mock_task_instance.delay() call These fixes ensure all mocks properly match the actual implementation behavior and all method calls are properly verified.

…tests Fix remaining test failures by ensuring db.session.scalars is properly mocked as a callable Mock in all tests that use it: 1. delete_documents tests: - Explicitly set mock_db_session.scalars = Mock(return_value=...) - This ensures scalars() is callable and returns the expected result 2. save_document_with_dataset_id_new_upload_file_success: - Add mock_db_session.scalars setup for duplicate check query - This query is used even when duplicate=False to check existing docs The issue was that mock_db_session.scalars might not have been properly initialized as a Mock, causing AttributeError when the code tries to call scalars(). By explicitly setting it, we ensure it's always a callable Mock.

Fix remaining test failures: 1. delete_documents tests: - Change mock_scalars_result.all.return_value to mock_scalars_result.all = Mock(return_value=...) - This ensures all() is a callable method, not just a property 2. save_document_with_dataset_id_file_not_found: - Add missing naive_utc_now mock - This is used in the code path before the FileNotExistsError is raised The key issue was that .all() needs to be a callable Mock method, not just a property with a return_value. This matches how SQLAlchemy's scalars().all() actually works.

Update test_save_document_with_dataset_id_new_upload_file_success to use the same pattern as delete_documents tests - making all() a callable Mock method instead of a property with return_value. This ensures consistency across all tests that mock db.session.scalars().all().

asukaminato0721 · 2025-11-23T03:16:23Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive set of unit tests for the DocumentService, which significantly improves the test coverage and reliability of document-related operations. The tests are well-structured and cover a wide range of scenarios, including success paths and error conditions. I've provided a few suggestions to enhance the maintainability and assertions in the tests. Overall, this is a great contribution to the project's quality.

gemini-code-assist · 2025-11-23T03:18:33Z

api/tests/unit_tests/services/test_document_service.py

+            # Assert
+            assert isinstance(result, Sequence)
+            assert len(result) == 3
+            assert all(doc in result for doc in documents)


Using set() for comparison is more robust for unordered sequences and provides a more informative diff if the assertion fails. This also improves performance from O(n*m) to O(n+m). This applies to other similar assertions in this file as well.

Suggested change

assert all(doc in result for doc in documents)

assert set(result) == set(documents)

gemini-code-assist · 2025-11-23T03:18:33Z

api/tests/unit_tests/services/test_document_service.py

+        mock_scalars_result = Mock()
+        mock_scalars_result.all = Mock(return_value=documents)
+        mock_select = Mock()
+        mock_select.where.return_value = mock_select
+        # Mock scalars to return the mock_scalars_result
+        mock_db_session.scalars = Mock(return_value=mock_scalars_result)


The mock setup for db.session.scalars is a bit verbose and inconsistent with other tests in this file (e.g., test_get_document_by_ids_success). It can be simplified for better readability and consistency by chaining return_value attributes.

Suggested change

mock_scalars_result = Mock()

mock_scalars_result.all = Mock(return_value=documents)

mock_select = Mock()

mock_select.where.return_value = mock_select

# Mock scalars to return the mock_scalars_result

mock_db_session.scalars = Mock(return_value=mock_scalars_result)

mock_select = Mock()

mock_select.where.return_value = mock_select

mock_db_session.scalars.return_value.all.return_value = documents

gemini-code-assist · 2025-11-23T03:18:33Z

api/tests/unit_tests/services/test_document_service.py

+        with patch("services.dataset_service.DocumentIndexingTaskProxy") as mock_task:
+            yield mock_task
+
+    def test_save_document_with_dataset_id_new_upload_file_success(


The save_document_with_dataset_id method includes logic to handle document duplication when knowledge_config.duplicate is True. This test suite seems to be missing a test case for this scenario. Adding a test to cover the duplication flow would improve test coverage and ensure this logic is robust.

gemini-code-assist · 2025-11-23T03:18:33Z

api/tests/unit_tests/services/test_document_service.py

+        with (
+            patch("services.dataset_service.DatasetService.check_doc_form") as mock_check_doc_form,
+            patch("services.dataset_service.FeatureService.get_features") as mock_features,
+            patch("services.dataset_service.DocumentService.get_documents_position") as mock_position,
+            patch("services.dataset_service.DocumentService.build_document") as mock_build,
+            patch("services.dataset_service.redis_client.lock") as mock_lock,
+            patch("services.dataset_service.select") as mock_select_func,
+            patch("services.dataset_service.DatasetProcessRule") as mock_process_rule,
+            patch("services.dataset_service.ModelManager") as mock_model_manager,
+            patch("services.dataset_service.DatasetCollectionBindingService") as mock_binding_service,
+            patch("services.dataset_service.DocumentIndexingTaskProxy") as mock_indexing_proxy,
+            patch("services.dataset_service.time.strftime") as mock_strftime,
+            patch("services.dataset_service.secrets.randbelow") as mock_randbelow,
+        ):


This test has a large number of mocks configured within a single with block. To improve readability and maintainability, consider refactoring these patches into separate pytest fixtures. This will make the test body cleaner and the setup reusable. For example:

@pytest.fixture def mock_check_doc_form(mocker): return mocker.patch("services.dataset_service.DatasetService.check_doc_form") @pytest.fixture def mock_features(mocker): mock = mocker.patch("services.dataset_service.FeatureService.get_features") mock.return_value.billing.enabled = False return mock # in test method: def test_save_document_with_dataset_id_new_upload_file_success(self, mock_check_doc_form, mock_features, ...): # ... test logic ...

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. 💪 enhancement New feature or request labels Nov 22, 2025

gemini-code-assist bot reviewed Nov 22, 2025

View reviewed changes

SmartDever02 and others added 10 commits November 22, 2025 13:20

Update api/tests/unit_tests/services/test_document_service.py

73c8f47

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update api/tests/unit_tests/services/test_document_service.py

9e9bdbe

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

[autofix.ci] apply automated fixes

eeb0db4

[autofix.ci] apply automated fixes

e352f56

Merge branch 'main' into feat/test-document-service

713d2e0

gemini-code-assist bot reviewed Nov 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add comprehensive unit tests for dataset service creation and retrieval methods #28547

feat: add comprehensive unit tests for dataset service creation and retrieval methods #28547

SmartDever02 commented Nov 22, 2025

Uh oh!

gemini-code-assist bot commented Nov 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asukaminato0721 commented Nov 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 23, 2025

Uh oh!

gemini-code-assist bot Nov 23, 2025

Uh oh!

gemini-code-assist bot Nov 23, 2025

Uh oh!

gemini-code-assist bot Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	assert all(doc in result for doc in documents)
	assert set(result) == set(documents)

feat: add comprehensive unit tests for dataset service creation and retrieval methods #28547

Are you sure you want to change the base?

feat: add comprehensive unit tests for dataset service creation and retrieval methods #28547

Conversation

SmartDever02 commented Nov 22, 2025

Description

Test Coverage Added

save_document_with_dataset_id Method Tests (3 test cases)

get_document Method Tests (3 test cases)

get_document_by_id Method Tests (2 test cases)

get_document_by_ids Method Tests (2 test cases)

get_document_by_dataset_id Method Tests (2 test cases)

get_working_documents_by_dataset_id Method Tests (2 test cases)

get_error_documents_by_dataset_id Method Tests (2 test cases)

get_batch_documents Method Tests (2 test cases)

update_document_with_dataset_id Method Tests (3 test cases)

delete_document Method Tests (3 test cases)

delete_documents Method Tests (4 test cases)

get_documents_position Method Tests (2 test cases)

Testing Approach

Test Statistics

Related Work

Checklist

Uh oh!

gemini-code-assist bot commented Nov 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asukaminato0721 commented Nov 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`save_document_with_dataset_id` Method Tests (3 test cases)

`get_document` Method Tests (3 test cases)

`get_document_by_id` Method Tests (2 test cases)

`get_document_by_ids` Method Tests (2 test cases)

`get_document_by_dataset_id` Method Tests (2 test cases)

`get_working_documents_by_dataset_id` Method Tests (2 test cases)

`get_error_documents_by_dataset_id` Method Tests (2 test cases)

`get_batch_documents` Method Tests (2 test cases)

`update_document_with_dataset_id` Method Tests (3 test cases)

`delete_document` Method Tests (3 test cases)

`delete_documents` Method Tests (4 test cases)

`get_documents_position` Method Tests (2 test cases)