CodeAgent Hallucinates `final_answer` Content #1257

fuji246 · 2025-04-27T06:25:44Z

fuji246
Apr 27, 2025

Description:

The CodeAgent sometimes generates content for the final_answer tool call directly within a code block, without strictly adhering to the intended workflow of calculating results, printing them, and then assembling the final answer based only on the resulting Observation history from previous steps. This leads to hallucinated or unverified final results, even when strong prompt rules are in place.

Scenario:

A task is given to CodeAgent that requires multi-step data loading, calculation, or analysis.
The expected final output is a structured dictionary or formatted string passed to final_answer.
The CodeAgent's system prompt has been customized with rules explicitly requiring (see "Attempts Made" below):
- Mandatory use of print() for all intermediate results needed for subsequent steps or the final answer (Rule 11/13).
- Strict separation of the final_answer code block from calculation blocks (Rule 12/14).
- Requirement that the final_answer block only assembles results using data explicitly printed in previous steps and available in Observation history (Rule 14).
- Suggestion/requirement to print() complex final structures first, then call final_answer in a subsequent step (Rule 15).
- Potentially a mandatory validation step before final_answer (Rule 16).
Planning is enabled (planning_interval is set).

Observed Behavior:

Instead of following the strict cycle, the agent might:

Example (Explanation then Jump): Generate a natural language explanation of the steps it intends to take, followed immediately by a final_answer call containing a fully formed, complex result structure with numerical values that were never calculated/printed in preceding Code blocks.

# Agent output might look like this:
# Thought: I will load data, validate, calculate metrics, and summarize.
# (Natural language explanation of steps 1, 2, 3...)
#
# final_answer({
#     "summary": "Analysis complete.",
#     "details": {
#         "metric_A": 123.45, # Value seemingly fabricated
#         "metric_B": "High", # Value seemingly fabricated
#         # ... other fabricated results
#     }
# })

Expected Behavior:

The agent should strictly adhere to the cycle:

Thought: Plan the calculation/action.
Code: Perform calculations, construct necessary intermediate or final structures, and print() any results/structures needed later.
Observation: Shows the exact output from print().
Thought: Acknowledge the observed results. If preparing the final answer, validate the observed structure against earlier observations if needed (per Rule 16).
Code: If validation passes (or if it's an intermediate step), either perform the next calculation OR call final_answer using only variables/structures that were present in the preceding Observation.

Attempts Made:

Added multiple strict rules to the CodeAgent system prompt (via prompt_templates) emphasizing the use of print(), separation of final_answer, reliance on Observation history, printing complex structures first, and pre-final answer validation.

  13. You MUST use `print()` within your `Code:` block to output the results of any file reads, data loading, calculations, or analyses (e.g., print dataframes, statistical summaries, variable values) that are necessary inputs for subsequent steps or for constructing the final answer. These outputs are crucial as they form the 'Observation:' for the next step.
  14. The code block that calls `final_answer` MUST be separate from the main calculation/analysis code blocks. This final block MUST ONLY assemble the answer using data, variables, or structures that were explicitly output using `print()` in PREVIOUS steps and are therefore visible in the preceding 'Observation:' history. Do NOT perform calculations, string formatting (like f-strings), list creation, or complex logic within the final `final_answer` block itself; ensure all results and structures needed for the final answer are calculated and `print`ed in earlier steps.
  15. If the final answer requires a complex structure (like a formatted summary string, a list of recommendations, or a nested dictionary), construct this structure in a variable within a calculation step and use `print()` to output the complete structure. Then, in the *next* separate step, call `final_answer` using only the variable name that holds the structure observed from the previous step's `print` output.
  16. After printing the proposed final answer structure (as required by Rule 15), you MUST dedicate the next step to validation. In the 'Thought:' of this validation step, explicitly compare the printed proposed structure (from the latest 'Observation:') against the raw calculation results printed in earlier 'Observation:' sections. State whether the structure accurately reflects the calculated data. Only if validation passes, proceed to call `final_answer` in the *subsequent* step. If validation fails, explain the discrepancy and generate code to correct the structure or perform necessary recalculations.

Enabled planning via planning_interval.

Question:

Is this a known limitation or behavior pattern? Are there further prompt engineering strategies, agent configurations, or alternative approaches within smolagents recommended to more reliably prevent the agent from bypassing the observation cycle and hallucinating the final_answer content, especially for complex outputs? Could explicit validation steps be made more robust?

Environment:

Agent: CodeAgent
Model: DeepSeek
Executor: LocalPythonExecutor

KeepALifeUS · 2026-02-12T20:58:42Z

KeepALifeUS
Feb 12, 2026

This is a known challenge with LLM-based agents. Here's what works:

1. State-Based Validation

Track computed values in state and validate final_answer against it:

state = {
    "computed_values": {},
    "print_history": []
}

def capture_print(value, key=None):
    state["print_history"].append(str(value))
    if key:
        state["computed_values"][key] = value
    print(value)
    return value

# In final_answer validation
def validate_final_answer(answer):
    # Check if values in answer were actually computed
    for value in extract_values(answer):
        if str(value) not in str(state["computed_values"].values()):
            raise ValidationError(f"Value {value} was not computed")

2. Two-Phase Approach

# Phase 1: Compute only (no final_answer allowed)
agent_compute = CodeAgent(
    tools=[...],
    system_prompt="Compute all values. DO NOT call final_answer. Store results in variables."
)
result = agent_compute.run(task)

# Phase 2: Format only (read from state)
final = format_results(state["computed_values"])

3. Constrained final_answer Tool

@tool
def final_answer(answer: str, computed_values: dict) -> str:
    # Require agent to explicitly pass what it computed
    # Validate against print history
    for key, value in computed_values.items():
        if str(value) not in state["print_history"]:
            return f"ERROR: {key}={value} was not printed. Recompute."
    return answer

Root Cause

LLMs pattern-match to produce plausible outputs. Without explicit grounding in computed state, they'll hallucinate "reasonable" numbers.

The fix is architectural: separate computation from output formatting, and validate against recorded state.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeAgent Hallucinates `final_answer` Content #1257

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

CodeAgent Hallucinates final_answer Content #1257

Uh oh!

fuji246 Apr 27, 2025

Replies: 1 comment

Uh oh!

KeepALifeUS Feb 12, 2026

1. State-Based Validation

2. Two-Phase Approach

3. Constrained final_answer Tool

Root Cause

CodeAgent Hallucinates `final_answer` Content #1257

fuji246
Apr 27, 2025

KeepALifeUS
Feb 12, 2026