Skip to content

Conversation

@Ferko-dts
Copy link

Changes πŸ—οΈ

  • Added an optional parameter for AIStructuredResponseGeneratorBlock in LLM.py.
  • Integrated Tesseract OCR inside the relevant method to enable text extraction from images.
  • Updated Dockerfile to install Tesseract OCR for proper functionality of the new feature.

Reason for changes:
These changes allow the AIStructuredResponseGeneratorBlock to optionally process images using OCR, enabling structured responses from image content. The Dockerfile update ensures that the necessary OCR engine is available in all deployment environments.

@Ferko-dts Ferko-dts requested a review from a team as a code owner November 13, 2025 23:15
@Ferko-dts Ferko-dts requested review from Pwuts and Swiftyos and removed request for a team November 13, 2025 23:15
@github-project-automation github-project-automation bot moved this to πŸ†• Needs initial review in AutoGPT development kanban Nov 13, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 13, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ferko seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions bot added platform/backend AutoGPT Platform - Back end platform/blocks labels Nov 13, 2025
@github-actions
Copy link
Contributor

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

@qodo-merge-pro
Copy link

PR Reviewer Guide πŸ”

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 πŸ”΅πŸ”΅πŸ”΅βšͺβšͺ
πŸ§ͺΒ No relevant tests
πŸ”’Β Security concerns

Input validation:
The OCR flow accepts arbitrary URLs and local paths without validation. This can enable SSRF (fetching internal network resources) or reading unintended local files if paths are accepted from untrusted input. Mitigate by restricting URL schemes/hosts, disallowing local file paths from untrusted contexts, enforcing size limits, and using timeouts plus content-type checks. Also avoid logging full OCR text if it may contain sensitive data.

⚑ Recommended focus areas for review

Error Handling

OCR path swallows all exceptions and appends raw error strings into the user prompt, which can leak internal details and pollute model inputs. Consider more granular handling, limiting message content, and avoiding mutation of input directly.

if input_data.image:
    try:
        # Handle different image input formats
        if input_data.image.startswith('http'):
            # URL image
            response = requests.get(input_data.image)
            image = Image.open(io.BytesIO(response.content))
        elif input_data.image.startswith('data:image'):
            # Base64 image
            base64_data = re.sub('^data:image/.+;base64,', '', input_data.image)
            image_data = base64.b64decode(base64_data)
            image = Image.open(io.BytesIO(image_data))
        else:
            # Local file path
            image = Image.open(input_data.image)

        # Perform OCR
        ocr_text = pytesseract.image_to_string(image)
        logger.debug(f"OCR extracted text: {ocr_text}")

        # Append OCR text to prompt if text was extracted
        if ocr_text.strip():
            if input_data.prompt:
                input_data.prompt += f"\n\nExtracted text from image:\n{ocr_text}"
            else:
                input_data.prompt = f"Extracted text from image:\n{ocr_text}"

    except Exception as e:
        logger.error(f"Error processing image with OCR: {str(e)}")
        if input_data.prompt:
            input_data.prompt += f"\n\nError processing image: {str(e)}"
        else:
            input_data.prompt = f"Error processing image: {str(e)}"
Network Robustness

Image download via requests.get lacks timeouts, status checks, and content-type validation. Add timeout, response.raise_for_status(), and validate image size/type to prevent hangs and misuse.

if input_data.image.startswith('http'):
    # URL image
    response = requests.get(input_data.image)
    image = Image.open(io.BytesIO(response.content))
elif input_data.image.startswith('data:image'):
    # Base64 image
    base64_data = re.sub('^data:image/.+;base64,', '', input_data.image)
    image_data = base64.b64decode(base64_data)
    image = Image.open(io.BytesIO(image_data))
else:
    # Local file path
    image = Image.open(input_data.image)

# Perform OCR
ocr_text = pytesseract.image_to_string(image)

Build Hygiene
Adding pytesseract via poetry add at build time introduces nondeterminism; prefer pinning in pyproject.toml. Also verify that installing tesseract twice (builder and runtime) is necessary and not redundant.

@netlify
Copy link

netlify bot commented Nov 13, 2025

βœ… Deploy Preview for auto-gpt-docs ready!

Name Link
πŸ”¨ Latest commit 364c807
πŸ” Latest deploy log https://app.netlify.com/projects/auto-gpt-docs/deploys/691666a68d330c00079cacd6
😎 Deploy Preview https://deploy-preview-11379--auto-gpt-docs.netlify.app
πŸ“± Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@AutoGPT-Agent
Copy link

Thank you for your PR adding OCR capabilities to the AIStructuredResponseGeneratorBlock! Here's some feedback to help get this PR ready for merging:

  1. Your PR description is missing the required checklist. Please add the template checklist and complete it, especially the sections about test plans since this is a significant feature addition.

  2. Your PR title needs to follow our conventional commit format. It should start with a type like feat: and include an appropriate scope. Based on the changes, something like feat(backend): add image OCR capabilities to LLM block would be more appropriate.

  3. The implementation looks good overall, but consider adding some basic error handling for cases where Tesseract might not be available or for handling different image formats.

  4. I notice there's some commented-out test endpoint code in v1.py - please either complete and uncomment this for testing or remove it if it's not needed for the PR.

  5. There's also a commented-out section in docker-compose.yml - please clarify if this is needed or should be removed.

Please address these items and we'll be happy to review again.

@deepsource-io
Copy link

deepsource-io bot commented Nov 13, 2025

Here's the code health analysis summary for commits a054740..364c807. View details on DeepSourceΒ β†—.

Analysis Summary

AnalyzerStatusSummaryLink
DeepSource JavaScript LogoJavaScriptβœ…Β SuccessView CheckΒ β†—
DeepSource Python LogoPythonβœ…Β Success
❗ 23 occurences introduced
🎯 11 occurences resolved
View CheckΒ β†—

πŸ’‘ If you’re a repository administrator, you can configure the quality gates from the settings.

@AutoGPT-Agent
Copy link

Thanks for your PR adding OCR capabilities to process images in prompts! Here are some items that need to be addressed before this can be merged:

  1. Missing Checklist: Please include the complete checklist from the PR template. Since this makes material code changes, the checklist is required to ensure proper testing has been done.

  2. PR Title Format: Your PR title needs to follow the conventional commit format. It should be structured like: feat(platform/blocks): Add image processing using OCR - starting with a type (feat, fix, etc.) and including the relevant scope.

  3. Testing: Please ensure you've thoroughly tested this new functionality, especially with different image formats (URL, local file, base64) and include your test plan in the checklist.

  4. Docker Compose Comments: There are commented out lines in the docker-compose.yml file changes. Please either remove these comments or explain why they're being preserved.

  5. Commented Route: There's a large commented-out endpoint in v1.py. If this is intended for testing only and not for the final PR, please remove it.

Your implementation of OCR functionality looks promising, but we need to ensure it meets all our PR requirements before merging. Let me know if you need any clarification on these items!

from typing import Any, Iterable, List, Literal, NamedTuple, Optional


import pytesseract
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: pytesseract is unconditionally imported in llm.py but is missing from pyproject.toml, leading to ModuleNotFoundError at startup.
Severity: CRITICAL | Confidence: 1.00

πŸ” Detailed Analysis

The application will crash at startup with a ModuleNotFoundError: No module named 'pytesseract' because pytesseract is imported unconditionally in llm.py at line 13, but it is not declared as a permanent dependency in pyproject.toml. The poetry add pytesseract --no-ansi || true command in the Dockerfile is an unreliable installation method that does not guarantee the dependency is always present, especially in non-Docker environments.

πŸ’‘ Suggested Fix

Add pytesseract as a formal dependency to pyproject.toml. Remove the unreliable poetry add pytesseract --no-ansi || true from the Dockerfile, allowing Poetry to manage dependencies correctly.

πŸ€– Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: autogpt_platform/backend/backend/blocks/llm.py#L13

Potential issue: The application will crash at startup with a `ModuleNotFoundError: No
module named 'pytesseract'` because `pytesseract` is imported unconditionally in
`llm.py` at line 13, but it is not declared as a permanent dependency in
`pyproject.toml`. The `poetry add pytesseract --no-ansi || true` command in the
Dockerfile is an unreliable installation method that does not guarantee the dependency
is always present, especially in non-Docker environments.

Did we get this right? πŸ‘ / πŸ‘Ž to inform future reviews.

Reference_id: 2669854

@github-actions github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Nov 14, 2025
@github-actions
Copy link
Contributor

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conflicts Automatically applied to PRs with merge conflicts platform/backend AutoGPT Platform - Back end platform/blocks Possible security concern Review effort 3/5 size/l

Projects

Status: πŸ†• Needs initial review

Development

Successfully merging this pull request may close these issues.

3 participants