-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingestion status callback update #142
base: main
Are you sure you want to change the base?
Conversation
…essfully in Artemis
…essfully in Artemis
…essfully in Artemis
…ckUpdate # Conflicts: # app/pipeline/chat/course_chat_pipeline.py # app/pipeline/chat/exercise_chat_pipeline.py
WalkthroughThe recent changes focus on enhancing the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant LectureIngestionPipeline
participant DeletionService
Client->>LectureIngestionPipeline: Initialize with optional DTO
LectureIngestionPipeline->>DeletionService: Request to delete old lectures
DeletionService->>LectureIngestionPipeline: Acknowledge deletion request
LectureIngestionPipeline->>Client: Return deletion status
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
app/domain/ingestion/ingestion_pipeline_execution_dto.py (1)
11-11
: Modification fromlecture_units
tolecture_unit
.The change from
lecture_units
(a list) tolecture_unit
(a single object) has not been fully propagated throughout the codebase. Please update the following instances to ensure consistency:
app/pipeline/lecture_ingestion_pipeline.py
- Line:
self, lecture_units: list[LectureUnitDTO], artemis_base_url: str
- Line:
for lecture_unit in lecture_units:
app/web/routers/webhooks.py
- Line:
pipeline.delete_old_lectures(dto.lecture_units)
app/domain/ingestion/deletionPipelineExecutionDto.py
- Line:
lecture_units: List[LectureUnitDTO] = Field(..., alias="pyrisLectureUnits")
app/domain/data/metrics/competency_student_metrics_dto.py
- Line:
lecture_units: Dict[int, Set[int]] = Field({}, alias="lectureUnits")
Please ensure that all logic handling these DTOs is updated accordingly to handle a single lecture unit instead of a list.
Analysis chain
Modification from
lecture_units
tolecture_unit
.The change from
lecture_units
(a list) tolecture_unit
(a single object) indicates a significant shift in how the data is processed. Ensure that all logic handling this DTO is updated accordingly to handle a single lecture unit instead of a list.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all instances where `lecture_units` was used and ensure they are updated to handle `lecture_unit`. # Test: Search for the usage of `lecture_units`. Expect: No instances of `lecture_units` should remain. rg --type python -A 5 $'lecture_units'Length of output: 68
Script:
#!/bin/bash # Description: Verify all instances where `lecture_units` was used and ensure they are updated to handle `lecture_unit`. # Test: Search for the usage of `lecture_units`. Expect: No instances of `lecture_units` should remain. rg --glob '*.py' -A 5 'lecture_units'Length of output: 2461
app/domain/data/lecture_unit_dto.py (1)
5-5
:base_url
field still present in the codebase.The
base_url
field is still used in multiple files, indicating that its removal from the DTO structure might not be complete or other parts of the codebase still depend on it. Here are the specific locations wherebase_url
was found:
app/web/status/LecturesDeletionStatusCallback.py
app/web/status/IngestionStatusCallback.py
app/web/status/status_update.py
app/web/routers/webhooks.py
app/web/routers/pipelines.py
app/vector_database/lecture_schema.py
app/retrieval/lecture_retrieval.py
app/pipeline/lecture_ingestion_pipeline.py
app/pipeline/chat/lecture_chat_pipeline.py
app/pipeline/chat/exercise_chat_pipeline.py
app/pipeline/chat/course_chat_pipeline.py
app/llm/external/ollama.py
app/domain/pipeline_execution_settings_dto.py
Please ensure that any logic depending on the
base_url
field is updated accordingly.Analysis chain
Removal of
to_update
andbase_url
fields.The removal of these fields simplifies the DTO structure. Ensure that any logic depending on these fields is updated accordingly.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all instances where `to_update` and `base_url` were used and ensure they are removed or updated. # Test: Search for the usage of `to_update`. Expect: No instances of `to_update` should remain. rg --type python -A 5 $'to_update' # Test: Search for the usage of `base_url`. Expect: No instances of `base_url` should remain. rg --type python -A 5 $'base_url'Length of output: 128
Script:
#!/bin/bash # Description: Verify all instances where `to_update` and `base_url` were used and ensure they are removed or updated. # Test: Search for the usage of `to_update`. Expect: No instances of `to_update` should remain. rg 'to_update' --glob '*.py' -A 5 # Test: Search for the usage of `base_url`. Expect: No instances of `base_url` should remain. rg 'base_url' --glob '*.py' -A 5Length of output: 15912
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (4)
app/web/routers/webhooks.py (3)
83-93
: Testing required for the/lectures/delete
endpointThe
/lectures/delete
endpoint is not referenced in any test files, indicating a lack of automated test coverage. Ensure that this endpoint is properly tested to verify its functionality and security.
- Add unit tests for the
lecture_deletion_webhook
function.- Ensure integration tests cover the
/lectures/delete
endpoint.Analysis chain
LGTM! Ensure proper usage and testing.
The endpoint
/lectures/delete
is well-defined and secured withTokenValidator
.However, ensure that this endpoint is properly tested and used correctly in the codebase.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `/lectures/delete` endpoint in the codebase. # Test: Search for the endpoint usage. Expect: Proper usage and testing. rg --type python -A 5 $'/lectures/delete'Length of output: 71
Script:
#!/bin/bash # Description: Verify the usage of the `/lectures/delete` endpoint in the codebase. # Test: Search for the endpoint usage. Expect: Proper usage and testing. rg --type py -A 5 '/lectures/delete'Length of output: 391
Script:
#!/bin/bash # Description: Verify the testing of the `/lectures/delete` endpoint in the codebase. # Test: Search for test cases related to the `/lectures/delete` endpoint. Expect: Proper testing. rg --type py -A 5 'test_lectures_delete'Length of output: 40
Script:
#!/bin/bash # Description: Verify the testing of the `/lectures/delete` endpoint in the codebase. # Test: Search for references to the `/lectures/delete` endpoint in test files. Expect: Proper testing. rg --type py '/lectures/delete' --glob '*test*'Length of output: 47
75-81
: Inconsistent endpoint renaming detected.The endpoint
/lectures/fullIngestion
still exists in the codebase and should be renamed tolecture_ingestion_webhook
for consistency.
- File:
app/web/routers/webhooks.py
- Line: 75
Please update the old endpoint name to the new one to ensure uniformity across the codebase.
Analysis chain
Verify the endpoint renaming consistency.
The endpoint
/lectures/fullIngestion
has been renamed tolecture_ingestion_webhook
.
Verify the endpoint renaming consistency.
The endpoint
/lectures/fullIngestion
has been renamed tolecture_ingestion_webhook
.
Verify the endpoint renaming consistency.
The endpoint
/lectures/fullIngestion
has been renamed tolecture_ingestion_webhook
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the consistency of the endpoint renaming in the codebase. # Test: Search for the old and new endpoint names. Expect: No occurrences of the old name. rg --type python -A 5 $'lecture_webhook' rg --type python -A 5 $'lecture_ingestion_webhook'Length of output: 151
Script:
#!/bin/bash # Description: Verify the consistency of the endpoint renaming in the codebase. # Test: Search for the old and new endpoint names. Expect: No occurrences of the old name. rg --glob '*.py' -A 5 'lecture_webhook' rg --glob '*.py' -A 5 'lecture_ingestion_webhook'Length of output: 494
Script:
#!/bin/bash # Description: Verify the consistency of the endpoint renaming in the codebase. # Test: Search for the original endpoint name. Expect: No occurrences of the old name. rg --glob '*.py' -A 5 '/lectures/fullIngestion'Length of output: 411
51-68
: No test cases found forrun_lecture_deletion_pipeline_worker
.It is crucial to add proper test cases for the
run_lecture_deletion_pipeline_worker
function to ensure its correct functionality, especially given its use of threading and exception handling.
- Ensure that this function is thoroughly tested.
- Add unit tests to cover various scenarios, including successful execution and exception handling.
Analysis chain
LGTM! Ensure proper usage and testing.
The function
run_lecture_deletion_pipeline_worker
is well-structured and handles exceptions gracefully.However, ensure that this function is properly tested and used correctly in the codebase.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `run_lecture_deletion_pipeline_worker` in the codebase. # Test: Search for the function usage. Expect: Proper usage and testing. rg --type python -A 5 $'run_lecture_deletion_pipeline_worker'Length of output: 91
Script:
#!/bin/bash # Description: Verify the usage of `run_lecture_deletion_pipeline_worker` in the codebase. # Test: Search for the function usage. Expect: Proper usage and testing. rg --glob '*.py' -A 5 'run_lecture_deletion_pipeline_worker'Length of output: 593
Script:
#!/bin/bash # Description: Verify the testing of `run_lecture_deletion_pipeline_worker` in the codebase. # Test: Search for test cases related to the function. Expect: Proper testing. rg --glob '*test*.py' 'run_lecture_deletion_pipeline_worker'Length of output: 209
Script:
#!/bin/bash # Description: Verify the testing of `run_lecture_deletion_pipeline_worker` in the codebase. # Test: Search for test cases related to the function. Expect: Proper testing. rg 'run_lecture_deletion_pipeline_worker' --glob '*test*.py' --glob '*tests*.py' --glob '*_test.py' --glob '*_tests.py' --debugLength of output: 32901
Script:
#!/bin/bash # Description: Verify the testing of `run_lecture_deletion_pipeline_worker` in the codebase. # Test: Search for test cases related to the function without applying ignore rules. Expect: Proper testing. rg 'run_lecture_deletion_pipeline_worker' --glob '*test*.py' --glob '*tests*.py' --glob '*_test.py' --glob '*_tests.py' --no-ignoreLength of output: 280
app/pipeline/lecture_ingestion_pipeline.py (1)
297-317
: Potential Issue withdelete_old_lectures
Method CallThe method
delete_old_lectures
is called within therun_lecture_deletion_pipeline_worker
function, but theartemis_base_url
parameter is not being passed. Ensure that theartemis_base_url
is correctly passed to the method to avoid runtime errors.
- File:
app/web/routers/webhooks.py
- Function:
run_lecture_deletion_pipeline_worker
- Line:
pipeline.delete_old_lectures(dto.lecture_units)
Consider updating the method call to include the
artemis_base_url
parameter.Analysis chain
LGTM! Ensure proper usage and testing.
The method
delete_old_lectures
now accepts parameters directly, enhancing reusability and decoupling from the class state.However, ensure that this method is properly tested and used correctly in the codebase.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of the `delete_old_lectures` method in the codebase. # Test: Search for the method usage. Expect: Proper usage and testing. rg --type python -A 5 $'delete_old_lectures'Length of output: 74
Script:
#!/bin/bash # Description: Verify the usage of the `delete_old_lectures` method in the codebase. # Test: Search for the method usage. Expect: Proper usage and testing. rg --type py -A 5 $'delete_old_lectures'Length of output: 841
Script:
#!/bin/bash # Description: Verify the context of the `delete_old_lectures` method call in `app/web/routers/webhooks.py`. # Test: Search for the method call and include additional lines for context. Expect: Proper parameters passed. rg --type py -A 10 -B 10 $'pipeline.delete_old_lectures(dto.lecture_units)' app/web/routers/webhooks.pyLength of output: 104
Script:
#!/bin/bash # Description: Verify the context of the `delete_old_lectures` method call in `app/web/routers/webhooks.py`. # Test: Search for the method call and include additional lines for context. Expect: Proper parameters passed. rg --type py -A 20 -B 20 'delete_old_lectures' app/web/routers/webhooks.pyLength of output: 1391
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (3)
app/web/status/LecturesDeletionStatusCallback.py (1)
1-2
: Remove unnecessary blank lines.The blank line after the import statement is unnecessary and can be removed to improve readability.
-
app/pipeline/lecture_ingestion_pipeline.py (2)
95-95
: Add checks for optionaldto
parameter.The
dto
parameter is used in several places without checking if it isNone
. Sincedto
is now optional, please add appropriate checks to handle cases wheredto
might beNone
to avoid potentialAttributeError
exceptions.
- Lines where
dto
is accessed without checks:
app/pipeline/lecture_ingestion_pipeline.py:98
app/pipeline/lecture_ingestion_pipeline.py:99
app/pipeline/lecture_ingestion_pipeline.py:100
app/pipeline/lecture_ingestion_pipeline.py:101
app/pipeline/lecture_ingestion_pipeline.py:106
app/pipeline/lecture_ingestion_pipeline.py:111
app/pipeline/lecture_ingestion_pipeline.py:112
app/pipeline/lecture_ingestion_pipeline.py:121
Analysis chain
Constructor update approved.
The constructor now accepts an optional
dto
parameter, enhancing flexibility.Ensure that the rest of the code correctly handles the optional
dto
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of `dto` in the codebase to ensure it handles the optional parameter correctly. # Test: Search for the usage of `dto`. Expect: Proper handling of optional `dto`. rg --type python -A 5 $'self.dto'Length of output: 63
Script:
#!/bin/bash # Description: Verify the usage of `dto` in the codebase to ensure it handles the optional parameter correctly. # Test: Search for the usage of `dto`. Expect: Proper handling of optional `dto`. rg --type py -A 5 $'self.dto'Length of output: 2806
130-138
: Ensureself.dto
is notNone
before accessing its properties in the__call__
method.The
__call__
method inapp/pipeline/lecture_ingestion_pipeline.py
accessesself.dto
properties directly without checking ifself.dto
isNone
. This could lead to potential runtime errors ifself.dto
isNone
.
- Add a check to ensure
self.dto
is notNone
before accessing its properties in the__call__
method.Analysis chain
Simplified lecture chunking and interpretation.
The lecture chunking and interpretation process has been simplified by directly accessing properties from
self.dto.lecture_unit
. Ensure thatself.dto
is notNone
before accessing its properties.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that `self.dto` is not `None` before accessing its properties in the `__call__` method. # Test: Search for the usage of `self.dto` in the `__call__` method. Expect: Proper null checks. rg --type python -A 5 $'self.dto' | rg -B 5 'if self.dto is not None'Length of output: 99
Script:
#!/bin/bash # Description: Verify that `self.dto` is not `None` before accessing its properties in the `__call__` method. # Test: Search for the usage of `self.dto` in the `__call__` method. Expect: Proper null checks. rg -t py -A 5 $'self.dto' | rg -B 5 'if self.dto is not None'Length of output: 60
Script:
#!/bin/bash # Description: Verify that `self.dto` is not `None` before accessing its properties in the `__call__` method in `app/pipeline/lecture_ingestion_pipeline.py`. # Locate the `__call__` method in the specified file. ast-grep --lang python --pattern $'def __call__($_):\n $$$' app/pipeline/lecture_ingestion_pipeline.pyLength of output: 105
Script:
#!/bin/bash # Description: Verify the presence of the `__call__` method and check for `None` validation for `self.dto` in `app/pipeline/lecture_ingestion_pipeline.py`. # Locate the `__call__` method in the specified file. rg -A 10 'def __call__' app/pipeline/lecture_ingestion_pipeline.pyLength of output: 580
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
app/pipeline/lecture_ingestion_pipeline.py (2)
95-97
: Ensure proper handling ofNone
value fordto
.The
dto
parameter is now optional, but there are multiple instances in the code whereself.dto
is accessed without checking forNone
. This can lead toNoneType
errors. Please ensure that all usages ofself.dto
within the class handle the possibility of it beingNone
.
app/pipeline/lecture_ingestion_pipeline.py
: Lines whereself.dto
is accessed without checks.Analysis chain
Ensure proper handling of
None
value fordto
.The
dto
parameter is now optional. Ensure that all usages ofself.dto
within the class handle the possibility of it beingNone
to avoidNoneType
errors.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all usages of `self.dto` handle the possibility of it being `None`. # Test: Search for `self.dto` usage. Expect: Proper handling of `None` values. rg --type python 'self\.dto'Length of output: 61
Script:
#!/bin/bash # Description: Verify that all usages of `self.dto` handle the possibility of it being `None`. # Test: Search for `self.dto` usage. Expect: Proper handling of `None` values. rg --type-add 'py:*.py' --type py 'self\.dto'Length of output: 914
297-317
: Thedelete_old_lectures
method call is missing a required parameter.The method call to
delete_old_lectures
inapp/web/routers/webhooks.py
only passesdto.lecture_units
and does not provide the requiredartemis_base_url
parameter. This needs to be corrected to ensure the method functions as intended.
- File:
app/web/routers/webhooks.py
- Line: Around 5 lines after the method call
pipeline.delete_old_lectures(dto.lecture_units)
Please update the method call to include the
artemis_base_url
parameter.Analysis chain
LGTM! Verify the method usage in the codebase.
The changes to the
delete_old_lectures
method enhance its reusability and clarity.However, ensure that all calls to
delete_old_lectures
pass the correct parameters.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all calls to `delete_old_lectures` pass the correct parameters. # Test: Search for the method usage. Expect: Correct parameters passed. rg --type python 'delete_old_lectures'Length of output: 69
Script:
#!/bin/bash # Description: Verify that all calls to `delete_old_lectures` pass the correct parameters. # Test: Search for the method usage. Expect: Correct parameters passed. rg --type py 'delete_old_lectures'Length of output: 190
Script:
#!/bin/bash # Description: Verify the exact context of the `delete_old_lectures` method call in `app/web/routers/webhooks.py`. # Test: Display the relevant lines around the method call. rg --type py -A 5 -B 5 'pipeline.delete_old_lectures' app/web/routers/webhooks.pyLength of output: 479
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range, codebase verification and nitpick comments (5)
app/web/status/lecture_deletion_status_callback.py (1)
12-31
: Well-implemented class for deletion status updates.The
LecturesDeletionStatusCallback
class is well-implemented with dynamic stage management and URL construction. However, consider adding error handling for potential issues with thebase_url
format or network problems during the callback.Consider implementing error handling mechanisms to improve robustness, such as:
try: # existing code except Exception as e: logger.error(f"Failed to update status: {e}") raiseapp/web/status/ingestion_status_callback.py (1)
Line range hint
14-43
: Enhanced functionality for lecture-specific ingestion status updates.The
IngestionStatusCallback
class has been effectively updated to includelecture_unit_id
, enhancing its functionality. The docstring update is also appropriate. Consider adding error handling for potential issues with thelecture_unit_id
or network problems during the callback.Consider implementing error handling mechanisms to improve robustness, such as:
try: # existing code except Exception as e: logger.error(f"Failed to update status: {e}") raiseapp/web/routers/webhooks.py (2)
75-81
: Approve implementation; suggest documentation improvements.The implementation of
lecture_ingestion_webhook
is appropriate, using a separate thread for non-blocking operations. Consider improving the documentation to better reflect the specific purpose of this webhook, which is to trigger the lecture ingestion pipeline.
83-93
: Approve implementation; suggest documentation improvements.The implementation of
lecture_deletion_webhook
is consistent with other webhook functions, using a separate thread for non-blocking operations. Consider improving the documentation to better reflect the specific purpose of this webhook, which is to trigger the lecture deletion process.app/pipeline/lecture_ingestion_pipeline.py (1)
298-318
: Refactoring ofdelete_old_lectures
approved.The method's refactoring to accept parameters directly enhances its reusability and clarity. This change supports the PR's objectives effectively.
Consider adding more detailed logging for each deletion attempt to improve traceability and debugging.
logger.info(f"Attempting to delete lecture: {lecture_unit.lecture_unit_id}") if self.delete_lecture_unit( lecture_unit.course_id, lecture_unit.lecture_id, lecture_unit.lecture_unit_id, artemis_base_url, ): logger.info("Lecture deleted successfully") else: logger.error("Failed to delete lecture")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
…nto IngestionStatusCallbackUpdate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. Left some comments.
…nto IngestionStatusCallbackUpdate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
app/web/routers/webhooks.py (1)
82-93
: LGTM: New lecture deletion webhook implemented correctly.The new
/lectures/delete
endpoint andlecture_deletion_webhook
function are well-implemented, following the same pattern as the ingestion webhook. The use of a separate thread and the HTTP 202 status code are appropriate for this asynchronous operation.Consider adding a brief comment in the function docstring to explain that this operation is asynchronous and the actual deletion happens in a separate thread. This can help API consumers understand the behavior. For example:
def lecture_deletion_webhook(dto: LecturesDeletionExecutionDto): """ Webhook endpoint to trigger the lecture deletion. Note: This operation is asynchronous. The deletion process is started in a separate thread, and the endpoint returns immediately with a 202 Accepted status. """ # ... rest of the function
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- app/web/routers/ingestion_status.py (1 hunks)
- app/web/routers/webhooks.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/web/routers/ingestion_status.py
🧰 Additional context used
🔇 Additional comments (4)
app/web/routers/webhooks.py (4)
12-16
: LGTM: New imports for lecture deletion functionality.The added imports for
LecturesDeletionStatusCallback
andLecturesDeletionExecutionDto
are consistent with the new lecture deletion functionality. They are correctly placed and follow the existing import structure.
35-35
: LGTM: Added lecture_unit_id to IngestionStatusCallback.The addition of
lecture_unit_id
to theIngestionStatusCallback
initialization enhances the context provided to the callback. This is a positive change for improved traceability during the ingestion process.
75-75
: LGTM: Improved webhook function naming.The change from
lecture_webhook
tolecture_ingestion_webhook
enhances clarity by explicitly stating the purpose of this webhook. This is a good improvement, especially considering the addition of a separate deletion webhook.
63-64
: Clarify the use of LectureIngestionPipeline for deletion.The use of
LectureIngestionPipeline
for deletion seems counterintuitive. Additionally, passingdto=None
to the pipeline initialization might cause issues if the pipeline methods rely ondto
.Could you please clarify:
- Why is
LectureIngestionPipeline
used for deletion instead of a dedicated deletion pipeline?- How does
LectureIngestionPipeline
handledto=None
?- Is there a reason for not passing the
LecturesDeletionExecutionDto
to the pipeline?To verify the
LectureIngestionPipeline
implementation, you can run:This will help us understand how the
delete_old_lectures
method is implemented and whether it can handledto=None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reapprove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
app/pipeline/lecture_ingestion_pipeline.py (2)
310-312
: Confirm necessity of thedelete_old_lectures
methodThe
delete_old_lectures
method has been refactored but is not invoked in the provided code. If it's no longer used, consider removing it to keep the codebase clean.
310-330
: Ensure consistent return values indelete_old_lectures
methodThe method returns
False
when an exception occurs but does not return any value upon successful completion. For consistency, consider returningTrue
when the method succeeds.Apply this diff to add a return statement:
self.callback.done("Old slides removed") + return True except Exception as e: logger.error(f"Error deleting lecture unit: {e}") self.callback.error("Error while removing old slides") return False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- app/pipeline/lecture_ingestion_pipeline.py (5 hunks)
🔇 Additional comments (2)
app/pipeline/lecture_ingestion_pipeline.py (2)
6-7
: Use of Optional fordto
parameter is appropriateGood use of
Optional
from thetyping
module to indicate thatdto
can beNone
.
124-129
: PotentialNoneType
exception when accessingself.dto
This issue was previously noted and is still applicable. Ensure to handle cases where
self.dto
may beNone
.
dto: Optional[IngestionPipelineExecutionDto], | ||
callback: ingestion_status_callback, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle None
value for optional dto
parameter
Since dto
is now optional, ensure that all usages of self.dto
in the class handle the case where self.dto
is None
to prevent potential AttributeError
s.
from ..web.status import ingestion_status_callback | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify the type annotation for callback
parameter
The callback
parameter is annotated with ingestion_status_callback
, which seems to be a function rather than a type. If it's intended to be a type hint for a callable, consider using Callable
from the typing
module.
For example:
-from ..web.status import ingestion_status_callback
+from typing import Callable
...
def __init__(
self,
client: WeaviateClient,
dto: Optional[IngestionPipelineExecutionDto],
- callback: ingestion_status_callback,
+ callback: Callable,
):
Alternatively, if ingestion_status_callback
is a custom type, ensure it follows naming conventions (e.g., IngestionStatusCallback
).
Also applies to: 97-98
pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64) | ||
chunks.extend( | ||
self.chunk_data( | ||
lecture_pdf=pdf_path, | ||
lecture_unit_dto=self.dto.lecture_unit, | ||
base_url=self.dto.settings.artemis_base_url, | ||
) | ||
cleanup_temporary_file(pdf_path) | ||
) | ||
cleanup_temporary_file(pdf_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure temporary files are cleaned up even on exceptions
If an exception occurs before cleanup_temporary_file(pdf_path)
is called, the temporary file may remain on the system. To guarantee cleanup regardless of success or failure, consider using a finally
block.
Apply this diff to modify the code:
pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64)
- chunks.extend(
- self.chunk_data(
- lecture_pdf=pdf_path,
- lecture_unit_dto=self.dto.lecture_unit,
- base_url=self.dto.settings.artemis_base_url,
- )
- )
- cleanup_temporary_file(pdf_path)
+ try:
+ chunks.extend(
+ self.chunk_data(
+ lecture_pdf=pdf_path,
+ lecture_unit_dto=self.dto.lecture_unit,
+ base_url=self.dto.settings.artemis_base_url,
+ )
+ )
+ finally:
+ cleanup_temporary_file(pdf_path)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64) | |
chunks.extend( | |
self.chunk_data( | |
lecture_pdf=pdf_path, | |
lecture_unit_dto=self.dto.lecture_unit, | |
base_url=self.dto.settings.artemis_base_url, | |
) | |
cleanup_temporary_file(pdf_path) | |
) | |
cleanup_temporary_file(pdf_path) | |
pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64) | |
try: | |
chunks.extend( | |
self.chunk_data( | |
lecture_pdf=pdf_path, | |
lecture_unit_dto=self.dto.lecture_unit, | |
base_url=self.dto.settings.artemis_base_url, | |
) | |
) | |
finally: | |
cleanup_temporary_file(pdf_path) |
This PR modifies the ingestion Stucture to be able to update artemis successfully, when ingestion is finished.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes