Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion status callback update #142

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

yassinsws
Copy link
Contributor

@yassinsws yassinsws commented Jul 22, 2024

This PR modifies the ingestion Stucture to be able to update artemis successfully, when ingestion is finished.

Summary by CodeRabbit

  • New Features

    • Introduced a new webhook endpoint for deleting lectures, enhancing the deletion process.
    • Added a new data transfer object for managing lecture deletions.
    • Implemented a new status callback class specifically for lecture deletion tracking.
    • Added an API endpoint to retrieve the ingestion state of lecture units.
  • Improvements

    • Simplified method signatures for deleting old lectures, improving flexibility and modularity.
    • Enhanced error handling for lecture deletion operations, providing clearer status updates.
  • Bug Fixes

    • Removed unnecessary fields from existing classes to ensure cleaner data handling, improving overall robustness.

Copy link
Contributor

coderabbitai bot commented Jul 22, 2024

Walkthrough

The recent changes focus on enhancing the LectureIngestionPipeline class within the application. Key updates include making the dto parameter optional in the constructor and modifying the callback parameter type. The delete_old_lectures method has been refactored to accept parameters directly, improving its signature and flexibility. A new method, delete_lecture_unit, has been introduced. Additionally, error handling has been improved across various methods, contributing to a more modular and maintainable code structure.

Changes

Files Change Summary
app/pipeline/lecture_ingestion_pipeline.py Updated constructor to make dto optional and change callback type. Refactored delete_old_lectures method to accept lecture_units and artemis_base_url as parameters. Added delete_lecture_unit method for direct lecture unit deletion. Enhanced error handling across methods.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant LectureIngestionPipeline
    participant DeletionService

    Client->>LectureIngestionPipeline: Initialize with optional DTO
    LectureIngestionPipeline->>DeletionService: Request to delete old lectures
    DeletionService->>LectureIngestionPipeline: Acknowledge deletion request
    LectureIngestionPipeline->>Client: Return deletion status
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (2)
app/domain/ingestion/ingestion_pipeline_execution_dto.py (1)

11-11: Modification from lecture_units to lecture_unit.

The change from lecture_units (a list) to lecture_unit (a single object) has not been fully propagated throughout the codebase. Please update the following instances to ensure consistency:

  • app/pipeline/lecture_ingestion_pipeline.py
    • Line: self, lecture_units: list[LectureUnitDTO], artemis_base_url: str
    • Line: for lecture_unit in lecture_units:
  • app/web/routers/webhooks.py
    • Line: pipeline.delete_old_lectures(dto.lecture_units)
  • app/domain/ingestion/deletionPipelineExecutionDto.py
    • Line: lecture_units: List[LectureUnitDTO] = Field(..., alias="pyrisLectureUnits")
  • app/domain/data/metrics/competency_student_metrics_dto.py
    • Line: lecture_units: Dict[int, Set[int]] = Field({}, alias="lectureUnits")

Please ensure that all logic handling these DTOs is updated accordingly to handle a single lecture unit instead of a list.

Analysis chain

Modification from lecture_units to lecture_unit.

The change from lecture_units (a list) to lecture_unit (a single object) indicates a significant shift in how the data is processed. Ensure that all logic handling this DTO is updated accordingly to handle a single lecture unit instead of a list.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all instances where `lecture_units` was used and ensure they are updated to handle `lecture_unit`.

# Test: Search for the usage of `lecture_units`. Expect: No instances of `lecture_units` should remain.
rg --type python -A 5 $'lecture_units'

Length of output: 68


Script:

#!/bin/bash
# Description: Verify all instances where `lecture_units` was used and ensure they are updated to handle `lecture_unit`.

# Test: Search for the usage of `lecture_units`. Expect: No instances of `lecture_units` should remain.
rg --glob '*.py' -A 5 'lecture_units'

Length of output: 2461

app/domain/data/lecture_unit_dto.py (1)

5-5: base_url field still present in the codebase.

The base_url field is still used in multiple files, indicating that its removal from the DTO structure might not be complete or other parts of the codebase still depend on it. Here are the specific locations where base_url was found:

  • app/web/status/LecturesDeletionStatusCallback.py
  • app/web/status/IngestionStatusCallback.py
  • app/web/status/status_update.py
  • app/web/routers/webhooks.py
  • app/web/routers/pipelines.py
  • app/vector_database/lecture_schema.py
  • app/retrieval/lecture_retrieval.py
  • app/pipeline/lecture_ingestion_pipeline.py
  • app/pipeline/chat/lecture_chat_pipeline.py
  • app/pipeline/chat/exercise_chat_pipeline.py
  • app/pipeline/chat/course_chat_pipeline.py
  • app/llm/external/ollama.py
  • app/domain/pipeline_execution_settings_dto.py

Please ensure that any logic depending on the base_url field is updated accordingly.

Analysis chain

Removal of to_update and base_url fields.

The removal of these fields simplifies the DTO structure. Ensure that any logic depending on these fields is updated accordingly.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all instances where `to_update` and `base_url` were used and ensure they are removed or updated.

# Test: Search for the usage of `to_update`. Expect: No instances of `to_update` should remain.
rg --type python -A 5 $'to_update'

# Test: Search for the usage of `base_url`. Expect: No instances of `base_url` should remain.
rg --type python -A 5 $'base_url'

Length of output: 128


Script:

#!/bin/bash
# Description: Verify all instances where `to_update` and `base_url` were used and ensure they are removed or updated.

# Test: Search for the usage of `to_update`. Expect: No instances of `to_update` should remain.
rg 'to_update' --glob '*.py' -A 5

# Test: Search for the usage of `base_url`. Expect: No instances of `base_url` should remain.
rg 'base_url' --glob '*.py' -A 5

Length of output: 15912

coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 22, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (4)
app/web/routers/webhooks.py (3)

83-93: Testing required for the /lectures/delete endpoint

The /lectures/delete endpoint is not referenced in any test files, indicating a lack of automated test coverage. Ensure that this endpoint is properly tested to verify its functionality and security.

  • Add unit tests for the lecture_deletion_webhook function.
  • Ensure integration tests cover the /lectures/delete endpoint.
Analysis chain

LGTM! Ensure proper usage and testing.

The endpoint /lectures/delete is well-defined and secured with TokenValidator.

However, ensure that this endpoint is properly tested and used correctly in the codebase.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the `/lectures/delete` endpoint in the codebase.

# Test: Search for the endpoint usage. Expect: Proper usage and testing.
rg --type python -A 5 $'/lectures/delete'

Length of output: 71


Script:

#!/bin/bash
# Description: Verify the usage of the `/lectures/delete` endpoint in the codebase.

# Test: Search for the endpoint usage. Expect: Proper usage and testing.
rg --type py -A 5 '/lectures/delete'

Length of output: 391


Script:

#!/bin/bash
# Description: Verify the testing of the `/lectures/delete` endpoint in the codebase.

# Test: Search for test cases related to the `/lectures/delete` endpoint. Expect: Proper testing.
rg --type py -A 5 'test_lectures_delete'

Length of output: 40


Script:

#!/bin/bash
# Description: Verify the testing of the `/lectures/delete` endpoint in the codebase.

# Test: Search for references to the `/lectures/delete` endpoint in test files. Expect: Proper testing.
rg --type py '/lectures/delete' --glob '*test*'

Length of output: 47


75-81: Inconsistent endpoint renaming detected.

The endpoint /lectures/fullIngestion still exists in the codebase and should be renamed to lecture_ingestion_webhook for consistency.

  • File: app/web/routers/webhooks.py
    • Line: 75

Please update the old endpoint name to the new one to ensure uniformity across the codebase.

Analysis chain

Verify the endpoint renaming consistency.

The endpoint /lectures/fullIngestion has been renamed to lecture_ingestion_webhook.


Verify the endpoint renaming consistency.

The endpoint /lectures/fullIngestion has been renamed to lecture_ingestion_webhook.


Verify the endpoint renaming consistency.

The endpoint /lectures/fullIngestion has been renamed to lecture_ingestion_webhook.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the consistency of the endpoint renaming in the codebase.

# Test: Search for the old and new endpoint names. Expect: No occurrences of the old name.
rg --type python -A 5 $'lecture_webhook'
rg --type python -A 5 $'lecture_ingestion_webhook'

Length of output: 151


Script:

#!/bin/bash
# Description: Verify the consistency of the endpoint renaming in the codebase.

# Test: Search for the old and new endpoint names. Expect: No occurrences of the old name.
rg --glob '*.py' -A 5 'lecture_webhook'
rg --glob '*.py' -A 5 'lecture_ingestion_webhook'

Length of output: 494


Script:

#!/bin/bash
# Description: Verify the consistency of the endpoint renaming in the codebase.

# Test: Search for the original endpoint name. Expect: No occurrences of the old name.
rg --glob '*.py' -A 5 '/lectures/fullIngestion'

Length of output: 411


51-68: No test cases found for run_lecture_deletion_pipeline_worker.

It is crucial to add proper test cases for the run_lecture_deletion_pipeline_worker function to ensure its correct functionality, especially given its use of threading and exception handling.

  • Ensure that this function is thoroughly tested.
  • Add unit tests to cover various scenarios, including successful execution and exception handling.
Analysis chain

LGTM! Ensure proper usage and testing.

The function run_lecture_deletion_pipeline_worker is well-structured and handles exceptions gracefully.

However, ensure that this function is properly tested and used correctly in the codebase.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `run_lecture_deletion_pipeline_worker` in the codebase.

# Test: Search for the function usage. Expect: Proper usage and testing.
rg --type python -A 5 $'run_lecture_deletion_pipeline_worker'

Length of output: 91


Script:

#!/bin/bash
# Description: Verify the usage of `run_lecture_deletion_pipeline_worker` in the codebase.

# Test: Search for the function usage. Expect: Proper usage and testing.
rg --glob '*.py' -A 5 'run_lecture_deletion_pipeline_worker'

Length of output: 593


Script:

#!/bin/bash
# Description: Verify the testing of `run_lecture_deletion_pipeline_worker` in the codebase.

# Test: Search for test cases related to the function. Expect: Proper testing.
rg --glob '*test*.py' 'run_lecture_deletion_pipeline_worker'

Length of output: 209


Script:

#!/bin/bash
# Description: Verify the testing of `run_lecture_deletion_pipeline_worker` in the codebase.

# Test: Search for test cases related to the function. Expect: Proper testing.
rg 'run_lecture_deletion_pipeline_worker' --glob '*test*.py' --glob '*tests*.py' --glob '*_test.py' --glob '*_tests.py' --debug

Length of output: 32901


Script:

#!/bin/bash
# Description: Verify the testing of `run_lecture_deletion_pipeline_worker` in the codebase.

# Test: Search for test cases related to the function without applying ignore rules. Expect: Proper testing.
rg 'run_lecture_deletion_pipeline_worker' --glob '*test*.py' --glob '*tests*.py' --glob '*_test.py' --glob '*_tests.py' --no-ignore

Length of output: 280

app/pipeline/lecture_ingestion_pipeline.py (1)

297-317: Potential Issue with delete_old_lectures Method Call

The method delete_old_lectures is called within the run_lecture_deletion_pipeline_worker function, but the artemis_base_url parameter is not being passed. Ensure that the artemis_base_url is correctly passed to the method to avoid runtime errors.

  • File: app/web/routers/webhooks.py
  • Function: run_lecture_deletion_pipeline_worker
  • Line: pipeline.delete_old_lectures(dto.lecture_units)

Consider updating the method call to include the artemis_base_url parameter.

Analysis chain

LGTM! Ensure proper usage and testing.

The method delete_old_lectures now accepts parameters directly, enhancing reusability and decoupling from the class state.

However, ensure that this method is properly tested and used correctly in the codebase.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the `delete_old_lectures` method in the codebase.

# Test: Search for the method usage. Expect: Proper usage and testing.
rg --type python -A 5 $'delete_old_lectures'

Length of output: 74


Script:

#!/bin/bash
# Description: Verify the usage of the `delete_old_lectures` method in the codebase.

# Test: Search for the method usage. Expect: Proper usage and testing.
rg --type py -A 5 $'delete_old_lectures'

Length of output: 841


Script:

#!/bin/bash
# Description: Verify the context of the `delete_old_lectures` method call in `app/web/routers/webhooks.py`.

# Test: Search for the method call and include additional lines for context. Expect: Proper parameters passed.
rg --type py -A 10 -B 10 $'pipeline.delete_old_lectures(dto.lecture_units)' app/web/routers/webhooks.py

Length of output: 104


Script:

#!/bin/bash
# Description: Verify the context of the `delete_old_lectures` method call in `app/web/routers/webhooks.py`.

# Test: Search for the method call and include additional lines for context. Expect: Proper parameters passed.
rg --type py -A 20 -B 20 'delete_old_lectures' app/web/routers/webhooks.py

Length of output: 1391

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (3)
app/web/status/LecturesDeletionStatusCallback.py (1)

1-2: Remove unnecessary blank lines.

The blank line after the import statement is unnecessary and can be removed to improve readability.

- 
app/pipeline/lecture_ingestion_pipeline.py (2)

95-95: Add checks for optional dto parameter.

The dto parameter is used in several places without checking if it is None. Since dto is now optional, please add appropriate checks to handle cases where dto might be None to avoid potential AttributeError exceptions.

  • Lines where dto is accessed without checks:
    • app/pipeline/lecture_ingestion_pipeline.py:98
    • app/pipeline/lecture_ingestion_pipeline.py:99
    • app/pipeline/lecture_ingestion_pipeline.py:100
    • app/pipeline/lecture_ingestion_pipeline.py:101
    • app/pipeline/lecture_ingestion_pipeline.py:106
    • app/pipeline/lecture_ingestion_pipeline.py:111
    • app/pipeline/lecture_ingestion_pipeline.py:112
    • app/pipeline/lecture_ingestion_pipeline.py:121
Analysis chain

Constructor update approved.

The constructor now accepts an optional dto parameter, enhancing flexibility.

Ensure that the rest of the code correctly handles the optional dto.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `dto` in the codebase to ensure it handles the optional parameter correctly.

# Test: Search for the usage of `dto`. Expect: Proper handling of optional `dto`.
rg --type python -A 5 $'self.dto'

Length of output: 63


Script:

#!/bin/bash
# Description: Verify the usage of `dto` in the codebase to ensure it handles the optional parameter correctly.

# Test: Search for the usage of `dto`. Expect: Proper handling of optional `dto`.
rg --type py -A 5 $'self.dto'

Length of output: 2806


130-138: Ensure self.dto is not None before accessing its properties in the __call__ method.

The __call__ method in app/pipeline/lecture_ingestion_pipeline.py accesses self.dto properties directly without checking if self.dto is None. This could lead to potential runtime errors if self.dto is None.

  • Add a check to ensure self.dto is not None before accessing its properties in the __call__ method.
Analysis chain

Simplified lecture chunking and interpretation.

The lecture chunking and interpretation process has been simplified by directly accessing properties from self.dto.lecture_unit. Ensure that self.dto is not None before accessing its properties.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that `self.dto` is not `None` before accessing its properties in the `__call__` method.

# Test: Search for the usage of `self.dto` in the `__call__` method. Expect: Proper null checks.
rg --type python -A 5 $'self.dto' | rg -B 5 'if self.dto is not None'

Length of output: 99


Script:

#!/bin/bash
# Description: Verify that `self.dto` is not `None` before accessing its properties in the `__call__` method.

# Test: Search for the usage of `self.dto` in the `__call__` method. Expect: Proper null checks.
rg -t py -A 5 $'self.dto' | rg -B 5 'if self.dto is not None'

Length of output: 60


Script:

#!/bin/bash
# Description: Verify that `self.dto` is not `None` before accessing its properties in the `__call__` method in `app/pipeline/lecture_ingestion_pipeline.py`.

# Locate the `__call__` method in the specified file.
ast-grep --lang python --pattern $'def __call__($_):\n    $$$' app/pipeline/lecture_ingestion_pipeline.py

Length of output: 105


Script:

#!/bin/bash
# Description: Verify the presence of the `__call__` method and check for `None` validation for `self.dto` in `app/pipeline/lecture_ingestion_pipeline.py`.

# Locate the `__call__` method in the specified file.
rg -A 10 'def __call__' app/pipeline/lecture_ingestion_pipeline.py

Length of output: 580

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (2)
app/pipeline/lecture_ingestion_pipeline.py (2)

95-97: Ensure proper handling of None value for dto.

The dto parameter is now optional, but there are multiple instances in the code where self.dto is accessed without checking for None. This can lead to NoneType errors. Please ensure that all usages of self.dto within the class handle the possibility of it being None.

  • app/pipeline/lecture_ingestion_pipeline.py: Lines where self.dto is accessed without checks.
Analysis chain

Ensure proper handling of None value for dto.

The dto parameter is now optional. Ensure that all usages of self.dto within the class handle the possibility of it being None to avoid NoneType errors.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all usages of `self.dto` handle the possibility of it being `None`.

# Test: Search for `self.dto` usage. Expect: Proper handling of `None` values.
rg --type python 'self\.dto'

Length of output: 61


Script:

#!/bin/bash
# Description: Verify that all usages of `self.dto` handle the possibility of it being `None`.

# Test: Search for `self.dto` usage. Expect: Proper handling of `None` values.
rg --type-add 'py:*.py' --type py 'self\.dto'

Length of output: 914


297-317: The delete_old_lectures method call is missing a required parameter.

The method call to delete_old_lectures in app/web/routers/webhooks.py only passes dto.lecture_units and does not provide the required artemis_base_url parameter. This needs to be corrected to ensure the method functions as intended.

  • File: app/web/routers/webhooks.py
  • Line: Around 5 lines after the method call pipeline.delete_old_lectures(dto.lecture_units)

Please update the method call to include the artemis_base_url parameter.

Analysis chain

LGTM! Verify the method usage in the codebase.

The changes to the delete_old_lectures method enhance its reusability and clarity.

However, ensure that all calls to delete_old_lectures pass the correct parameters.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all calls to `delete_old_lectures` pass the correct parameters.

# Test: Search for the method usage. Expect: Correct parameters passed.
rg --type python 'delete_old_lectures'

Length of output: 69


Script:

#!/bin/bash
# Description: Verify that all calls to `delete_old_lectures` pass the correct parameters.

# Test: Search for the method usage. Expect: Correct parameters passed.
rg --type py 'delete_old_lectures'

Length of output: 190


Script:

#!/bin/bash
# Description: Verify the exact context of the `delete_old_lectures` method call in `app/web/routers/webhooks.py`.

# Test: Display the relevant lines around the method call.
rg --type py -A 5 -B 5 'pipeline.delete_old_lectures' app/web/routers/webhooks.py

Length of output: 479

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (5)
app/web/status/lecture_deletion_status_callback.py (1)

12-31: Well-implemented class for deletion status updates.

The LecturesDeletionStatusCallback class is well-implemented with dynamic stage management and URL construction. However, consider adding error handling for potential issues with the base_url format or network problems during the callback.

Consider implementing error handling mechanisms to improve robustness, such as:

try:
    # existing code
except Exception as e:
    logger.error(f"Failed to update status: {e}")
    raise
app/web/status/ingestion_status_callback.py (1)

Line range hint 14-43: Enhanced functionality for lecture-specific ingestion status updates.

The IngestionStatusCallback class has been effectively updated to include lecture_unit_id, enhancing its functionality. The docstring update is also appropriate. Consider adding error handling for potential issues with the lecture_unit_id or network problems during the callback.

Consider implementing error handling mechanisms to improve robustness, such as:

try:
    # existing code
except Exception as e:
    logger.error(f"Failed to update status: {e}")
    raise
app/web/routers/webhooks.py (2)

75-81: Approve implementation; suggest documentation improvements.

The implementation of lecture_ingestion_webhook is appropriate, using a separate thread for non-blocking operations. Consider improving the documentation to better reflect the specific purpose of this webhook, which is to trigger the lecture ingestion pipeline.


83-93: Approve implementation; suggest documentation improvements.

The implementation of lecture_deletion_webhook is consistent with other webhook functions, using a separate thread for non-blocking operations. Consider improving the documentation to better reflect the specific purpose of this webhook, which is to trigger the lecture deletion process.

app/pipeline/lecture_ingestion_pipeline.py (1)

298-318: Refactoring of delete_old_lectures approved.

The method's refactoring to accept parameters directly enhances its reusability and clarity. This change supports the PR's objectives effectively.

Consider adding more detailed logging for each deletion attempt to improve traceability and debugging.

logger.info(f"Attempting to delete lecture: {lecture_unit.lecture_unit_id}")
if self.delete_lecture_unit(
    lecture_unit.course_id,
    lecture_unit.lecture_id,
    lecture_unit.lecture_unit_id,
    artemis_base_url,
):
    logger.info("Lecture deleted successfully")
else:
    logger.error("Failed to delete lecture")

app/domain/ingestion/ingestion_pipeline_execution_dto.py Outdated Show resolved Hide resolved
app/pipeline/lecture_ingestion_pipeline.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

coderabbitai[bot]
coderabbitai bot previously approved these changes Oct 2, 2024
Copy link
Contributor

@kaancayli kaancayli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Left some comments.

app/web/routers/ingestion_status.py Outdated Show resolved Hide resolved
app/web/routers/ingestion_status.py Outdated Show resolved Hide resolved
app/web/routers/ingestion_status.py Outdated Show resolved Hide resolved
coderabbitai[bot]

This comment was marked as spam.

coderabbitai[bot]
coderabbitai bot previously approved these changes Oct 13, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
app/web/routers/webhooks.py (1)

82-93: LGTM: New lecture deletion webhook implemented correctly.

The new /lectures/delete endpoint and lecture_deletion_webhook function are well-implemented, following the same pattern as the ingestion webhook. The use of a separate thread and the HTTP 202 status code are appropriate for this asynchronous operation.

Consider adding a brief comment in the function docstring to explain that this operation is asynchronous and the actual deletion happens in a separate thread. This can help API consumers understand the behavior. For example:

def lecture_deletion_webhook(dto: LecturesDeletionExecutionDto):
    """
    Webhook endpoint to trigger the lecture deletion.
    
    Note: This operation is asynchronous. The deletion process
    is started in a separate thread, and the endpoint returns
    immediately with a 202 Accepted status.
    """
    # ... rest of the function
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 481ce45 and d2695bc.

📒 Files selected for processing (2)
  • app/web/routers/ingestion_status.py (1 hunks)
  • app/web/routers/webhooks.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/web/routers/ingestion_status.py
🧰 Additional context used
🔇 Additional comments (4)
app/web/routers/webhooks.py (4)

12-16: LGTM: New imports for lecture deletion functionality.

The added imports for LecturesDeletionStatusCallback and LecturesDeletionExecutionDto are consistent with the new lecture deletion functionality. They are correctly placed and follow the existing import structure.


35-35: LGTM: Added lecture_unit_id to IngestionStatusCallback.

The addition of lecture_unit_id to the IngestionStatusCallback initialization enhances the context provided to the callback. This is a positive change for improved traceability during the ingestion process.


75-75: LGTM: Improved webhook function naming.

The change from lecture_webhook to lecture_ingestion_webhook enhances clarity by explicitly stating the purpose of this webhook. This is a good improvement, especially considering the addition of a separate deletion webhook.


63-64: Clarify the use of LectureIngestionPipeline for deletion.

The use of LectureIngestionPipeline for deletion seems counterintuitive. Additionally, passing dto=None to the pipeline initialization might cause issues if the pipeline methods rely on dto.

Could you please clarify:

  1. Why is LectureIngestionPipeline used for deletion instead of a dedicated deletion pipeline?
  2. How does LectureIngestionPipeline handle dto=None?
  3. Is there a reason for not passing the LecturesDeletionExecutionDto to the pipeline?

To verify the LectureIngestionPipeline implementation, you can run:

This will help us understand how the delete_old_lectures method is implemented and whether it can handle dto=None.

app/web/routers/webhooks.py Show resolved Hide resolved
Copy link
Contributor

@MichaelOwenDyer MichaelOwenDyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good

Copy link
Contributor

@kaancayli kaancayli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reapprove

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
app/pipeline/lecture_ingestion_pipeline.py (2)

310-312: Confirm necessity of the delete_old_lectures method

The delete_old_lectures method has been refactored but is not invoked in the provided code. If it's no longer used, consider removing it to keep the codebase clean.


310-330: Ensure consistent return values in delete_old_lectures method

The method returns False when an exception occurs but does not return any value upon successful completion. For consistency, consider returning True when the method succeeds.

Apply this diff to add a return statement:

             self.callback.done("Old slides removed")
+            return True
         except Exception as e:
             logger.error(f"Error deleting lecture unit: {e}")
             self.callback.error("Error while removing old slides")
             return False
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d2695bc and 4246fe8.

📒 Files selected for processing (1)
  • app/pipeline/lecture_ingestion_pipeline.py (5 hunks)
🔇 Additional comments (2)
app/pipeline/lecture_ingestion_pipeline.py (2)

6-7: Use of Optional for dto parameter is appropriate

Good use of Optional from the typing module to indicate that dto can be None.


124-129: Potential NoneType exception when accessing self.dto

This issue was previously noted and is still applicable. Ensure to handle cases where self.dto may be None.

Comment on lines +97 to +98
dto: Optional[IngestionPipelineExecutionDto],
callback: ingestion_status_callback,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle None value for optional dto parameter

Since dto is now optional, ensure that all usages of self.dto in the class handle the case where self.dto is None to prevent potential AttributeErrors.

Comment on lines +35 to +36
from ..web.status import ingestion_status_callback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Clarify the type annotation for callback parameter

The callback parameter is annotated with ingestion_status_callback, which seems to be a function rather than a type. If it's intended to be a type hint for a callable, consider using Callable from the typing module.

For example:

-from ..web.status import ingestion_status_callback
+from typing import Callable

...

     def __init__(
         self,
         client: WeaviateClient,
         dto: Optional[IngestionPipelineExecutionDto],
-        callback: ingestion_status_callback,
+        callback: Callable,
     ):

Alternatively, if ingestion_status_callback is a custom type, ensure it follows naming conventions (e.g., IngestionStatusCallback).

Also applies to: 97-98

Comment on lines +133 to +141
pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64)
chunks.extend(
self.chunk_data(
lecture_pdf=pdf_path,
lecture_unit_dto=self.dto.lecture_unit,
base_url=self.dto.settings.artemis_base_url,
)
cleanup_temporary_file(pdf_path)
)
cleanup_temporary_file(pdf_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure temporary files are cleaned up even on exceptions

If an exception occurs before cleanup_temporary_file(pdf_path) is called, the temporary file may remain on the system. To guarantee cleanup regardless of success or failure, consider using a finally block.

Apply this diff to modify the code:

         pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64)
-        chunks.extend(
-            self.chunk_data(
-                lecture_pdf=pdf_path,
-                lecture_unit_dto=self.dto.lecture_unit,
-                base_url=self.dto.settings.artemis_base_url,
-            )
-        )
-        cleanup_temporary_file(pdf_path)
+        try:
+            chunks.extend(
+                self.chunk_data(
+                    lecture_pdf=pdf_path,
+                    lecture_unit_dto=self.dto.lecture_unit,
+                    base_url=self.dto.settings.artemis_base_url,
+                )
+            )
+        finally:
+            cleanup_temporary_file(pdf_path)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64)
chunks.extend(
self.chunk_data(
lecture_pdf=pdf_path,
lecture_unit_dto=self.dto.lecture_unit,
base_url=self.dto.settings.artemis_base_url,
)
cleanup_temporary_file(pdf_path)
)
cleanup_temporary_file(pdf_path)
pdf_path = save_pdf(self.dto.lecture_unit.pdf_file_base64)
try:
chunks.extend(
self.chunk_data(
lecture_pdf=pdf_path,
lecture_unit_dto=self.dto.lecture_unit,
base_url=self.dto.settings.artemis_base_url,
)
)
finally:
cleanup_temporary_file(pdf_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants