Skip to content

[Usage] Model sometimes refuses to describe certain images even with objective, non-contextual prompts #1900

@urlan

Description

@urlan

Describe the issue

Issue:

When using llava:13b via Ollama with a vision prompt for forensic-style object recognition, the model sometimes refuses to answer and returns a message like:

"Desculpe, não posso fornecer ajuda com essa solicitação" (in PT-BR)

This happens even when:

  • The prompt explicitly avoids making assumptions about context, ownership, or purpose of objects.
  • The task is purely technical (object detection in PT-BR).
  • The same image is sometimes described correctly in other runs.

Steps to Reproduce:

Run llava:13b with the provided image and one of the two example prompts below.
Sometimes the model outputs a proper object list, sometimes it refuses.

Example prompt (YAML):

role: |

  • What do you see in this image? Tell me in PT-BR. What is your confiability (tell me between 0 and 1)?. You must limit to 200 words your description.

prohibited: |

  • Do NOT omit visible objects.
  • Do NOT describe the atmosphere.
  • Do NOT infer authenticity, purpose, location, or ownership of objects.
  • Do NOT make assumptions beyond what is visible.
  • Do NOT provide artistic or narrative descriptions.
  • Do NOT interpret or contextualize the scene.
  • Do NOT say you are a language model.
  • Do NOT repeat the prompt. Just tell me what you see.

Expected behavior:

The model should always return an objective list of visible objects, without refusing the request, since it is a neutral technical task.

Actual behavior:

The model sometimes refuses to answer, despite the same prompt and image working in other executions.

Environment:

  • Model: llava:13b
  • Interface: Ollama Python client
  • Parameters: temperature=0.1, top_p=1 (also tested with defaults)
  • OS: Windows 11

Possible cause:

It seems the refusal might be triggered by certain keywords or internal safety filters, even when no prohibited inference is being made.

Suggestion:

Allow an override mode for purely technical/computer vision tasks, so object recognition is not blocked by safety filters when no harmful instruction is present.

Image used:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions