#10006 Adding images to the prompt using OCR #11379

Ferko-dts · 2025-11-13T23:15:46Z

Changes 🏗️

Added an optional parameter for AIStructuredResponseGeneratorBlock in LLM.py.
Integrated Tesseract OCR inside the relevant method to enable text extraction from images.
Updated Dockerfile to install Tesseract OCR for proper functionality of the new feature.

Reason for changes:
These changes allow the AIStructuredResponseGeneratorBlock to optionally process images using OCR, enabling structured responses from image content. The Dockerfile update ensures that the necessary OCR engine is available in all deployment environments.

coderabbitai · 2025-11-13T23:15:53Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

CLAassistant · 2025-11-13T23:15:54Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Ferko seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2025-11-13T23:15:58Z

This PR targets the master branch but does not come from dev or a hotfix/* branch.

Automatically setting the base branch to dev.

qodo-merge-pro · 2025-11-13T23:16:22Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 Security concerns Input validation: The OCR flow accepts arbitrary URLs and local paths without validation. This can enable SSRF (fetching internal network resources) or reading unintended local files if paths are accepted from untrusted input. Mitigate by restricting URL schemes/hosts, disallowing local file paths from untrusted contexts, enforcing size limits, and using timeouts plus content-type checks. Also avoid logging full OCR text if it may contain sensitive data.
⚡ Recommended focus areas for review Error Handling OCR path swallows all exceptions and appends raw error strings into the user prompt, which can leak internal details and pollute model inputs. Consider more granular handling, limiting message content, and avoiding mutation of input directly. if input_data.image: try: # Handle different image input formats if input_data.image.startswith('http'): # URL image response = requests.get(input_data.image) image = Image.open(io.BytesIO(response.content)) elif input_data.image.startswith('data:image'): # Base64 image base64_data = re.sub('^data:image/.+;base64,', '', input_data.image) image_data = base64.b64decode(base64_data) image = Image.open(io.BytesIO(image_data)) else: # Local file path image = Image.open(input_data.image) # Perform OCR ocr_text = pytesseract.image_to_string(image) logger.debug(f"OCR extracted text: {ocr_text}") # Append OCR text to prompt if text was extracted if ocr_text.strip(): if input_data.prompt: input_data.prompt += f"\n\nExtracted text from image:\n{ocr_text}" else: input_data.prompt = f"Extracted text from image:\n{ocr_text}" except Exception as e: logger.error(f"Error processing image with OCR: {str(e)}") if input_data.prompt: input_data.prompt += f"\n\nError processing image: {str(e)}" else: input_data.prompt = f"Error processing image: {str(e)}" Network Robustness Image download via requests.get lacks timeouts, status checks, and content-type validation. Add timeout, response.raise_for_status(), and validate image size/type to prevent hangs and misuse. if input_data.image.startswith('http'): # URL image response = requests.get(input_data.image) image = Image.open(io.BytesIO(response.content)) elif input_data.image.startswith('data:image'): # Base64 image base64_data = re.sub('^data:image/.+;base64,', '', input_data.image) image_data = base64.b64decode(base64_data) image = Image.open(io.BytesIO(image_data)) else: # Local file path image = Image.open(input_data.image) # Perform OCR ocr_text = pytesseract.image_to_string(image) Build Hygiene Adding pytesseract via poetry add at build time introduces nondeterminism; prefer pinning in pyproject.toml. Also verify that installing tesseract twice (builder and runtime) is necessary and not redundant.

netlify · 2025-11-13T23:16:35Z

✅ Deploy Preview for auto-gpt-docs ready!

Name	Link
🔨 Latest commit	`364c807`
🔍 Latest deploy log	https://app.netlify.com/projects/auto-gpt-docs/deploys/691666a68d330c00079cacd6
😎 Deploy Preview	https://deploy-preview-11379--auto-gpt-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

AutoGPT-Agent · 2025-11-13T23:16:54Z

Thank you for your PR adding OCR capabilities to the AIStructuredResponseGeneratorBlock! Here's some feedback to help get this PR ready for merging:

Your PR description is missing the required checklist. Please add the template checklist and complete it, especially the sections about test plans since this is a significant feature addition.
Your PR title needs to follow our conventional commit format. It should start with a type like feat: and include an appropriate scope. Based on the changes, something like feat(backend): add image OCR capabilities to LLM block would be more appropriate.
The implementation looks good overall, but consider adding some basic error handling for cases where Tesseract might not be available or for handling different image formats.
I notice there's some commented-out test endpoint code in v1.py - please either complete and uncomment this for testing or remove it if it's not needed for the PR.
There's also a commented-out section in docker-compose.yml - please clarify if this is needed or should be removed.

Please address these items and we'll be happy to review again.

deepsource-io · 2025-11-13T23:17:04Z

Here's the code health analysis summary for commits a054740..364c807. View details on DeepSource ↗.

Analysis Summary

Analyzer	Status	Summary	Link
JavaScript	✅ Success		View Check ↗
Python	✅ Success	❗ 23 occurences introduced 🎯 11 occurences resolved	View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

AutoGPT-Agent · 2025-11-13T23:17:13Z

Thanks for your PR adding OCR capabilities to process images in prompts! Here are some items that need to be addressed before this can be merged:

Missing Checklist: Please include the complete checklist from the PR template. Since this makes material code changes, the checklist is required to ensure proper testing has been done.
PR Title Format: Your PR title needs to follow the conventional commit format. It should be structured like: feat(platform/blocks): Add image processing using OCR - starting with a type (feat, fix, etc.) and including the relevant scope.
Testing: Please ensure you've thoroughly tested this new functionality, especially with different image formats (URL, local file, base64) and include your test plan in the checklist.
Docker Compose Comments: There are commented out lines in the docker-compose.yml file changes. Please either remove these comments or explain why they're being preserved.
Commented Route: There's a large commented-out endpoint in v1.py. If this is intended for testing only and not for the final PR, please remove it.

Your implementation of OCR functionality looks promising, but we need to ensure it meets all our PR requirements before merging. Let me know if you need any clarification on these items!

sentry · 2025-11-13T23:17:53Z

autogpt_platform/backend/backend/blocks/llm.py

 from typing import Any, Iterable, List, Literal, NamedTuple, Optional

+
+import pytesseract


Bug: pytesseract is unconditionally imported in llm.py but is missing from pyproject.toml, leading to ModuleNotFoundError at startup.
_{Severity: CRITICAL | Confidence: 1.00}

🔍 Detailed Analysis

The application will crash at startup with a ModuleNotFoundError: No module named 'pytesseract' because pytesseract is imported unconditionally in llm.py at line 13, but it is not declared as a permanent dependency in pyproject.toml. The poetry add pytesseract --no-ansi || true command in the Dockerfile is an unreliable installation method that does not guarantee the dependency is always present, especially in non-Docker environments.

💡 Suggested Fix

Add pytesseract as a formal dependency to pyproject.toml. Remove the unreliable poetry add pytesseract --no-ansi || true from the Dockerfile, allowing Poetry to manage dependencies correctly.

🤖 Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: autogpt_platform/backend/backend/blocks/llm.py#L13 Potential issue: The application will crash at startup with a `ModuleNotFoundError: No module named 'pytesseract'` because `pytesseract` is imported unconditionally in `llm.py` at line 13, but it is not declared as a permanent dependency in `pyproject.toml`. The `poetry add pytesseract --no-ansi || true` command in the Dockerfile is an unreliable installation method that does not guarantee the dependency is always present, especially in non-Docker environments.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

_{Reference_id: 2669854}

github-actions · 2025-11-14T05:40:49Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

Significant-Gravitas#10006 done -> OCR approach

364c807

Ferko-dts requested a review from a team as a code owner November 13, 2025 23:15

Ferko-dts requested review from Pwuts and Swiftyos and removed request for a team November 13, 2025 23:15

github-project-automation bot added this to AutoGPT development kanban Nov 13, 2025

github-project-automation bot moved this to 🆕 Needs initial review in AutoGPT development kanban Nov 13, 2025

github-actions bot added platform/backend AutoGPT Platform - Back end platform/blocks labels Nov 13, 2025

github-actions bot changed the base branch from master to dev November 13, 2025 23:15

github-actions bot added the size/l label Nov 13, 2025

qodo-merge-pro bot added Possible security concern Review effort 3/5 labels Nov 13, 2025

sentry bot reviewed Nov 13, 2025

View reviewed changes

github-actions bot added the conflicts Automatically applied to PRs with merge conflicts label Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#10006 Adding images to the prompt using OCR #11379

#10006 Adding images to the prompt using OCR #11379

Ferko-dts commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 13, 2025

Review skipped

Uh oh!

CLAassistant commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

qodo-merge-pro bot commented Nov 13, 2025

Uh oh!

netlify bot commented Nov 13, 2025

Uh oh!

AutoGPT-Agent commented Nov 13, 2025

Uh oh!

deepsource-io bot commented Nov 13, 2025 •

edited

Loading

Analysis Summary

Uh oh!

AutoGPT-Agent commented Nov 13, 2025

Uh oh!

sentry bot Nov 13, 2025

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from typing import Any, Iterable, List, Literal, NamedTuple, Optional


		import pytesseract

#10006 Adding images to the prompt using OCR #11379

Are you sure you want to change the base?

#10006 Adding images to the prompt using OCR #11379

Conversation

Ferko-dts commented Nov 13, 2025

Changes 🏗️

Uh oh!

coderabbitai bot commented Nov 13, 2025

Review skipped

Uh oh!

CLAassistant commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

qodo-merge-pro bot commented Nov 13, 2025

PR Reviewer Guide 🔍

Uh oh!

netlify bot commented Nov 13, 2025

✅ Deploy Preview for auto-gpt-docs ready!

Uh oh!

AutoGPT-Agent commented Nov 13, 2025

Uh oh!

deepsource-io bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis Summary

Uh oh!

AutoGPT-Agent commented Nov 13, 2025

Uh oh!

sentry bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

deepsource-io bot commented Nov 13, 2025 •

edited

Loading