invoke_agent API on bedrock-agent-runtime inconsistently returns retrievedReferences

### Describe the bug

I have an Agent and a Knowledge Base set up in AWS Bedrock. 

I'm trying to provide a citation link to the document in S3 that was used as part of RAG for a particular user query. The first response in the session has `retrievedReferences` that I can use to create this link. Any subsequent responses generated using RAG either have an empty array for `retrievedReferences` or don't even have an `attribution` field.

### Expected Behavior

When RAG is used by the agent and it's pulling information from the Knowledge Base. The `attribution` field should be in the response and the `retrievedReferences` field should include the appropriate references rather than being an empty array.

### Current Behavior

The first time the Agent uses the Knowledge Base, it returns the citations to the S3 files correctly:
[agent_response_with_retrieved_references.json](https://github.com/user-attachments/files/16170502/agent_response_with_retrieved_references.json)

Subsequent responses from the Agent with the same session id are missing `attribution` field or have empty arrays for `retrievedReferences`:
[agent_response_missing_retrieved_references.json](https://github.com/user-attachments/files/16170505/agent_response_missing_retrieved_references.json)
[agent_response_missing_attribution.json](https://github.com/user-attachments/files/16170499/agent_response_missing_attribution.json)

### Reproduction Steps

Call the `invoke_agent` API with multiple questions that will hit the Knowledge Base in the same session.

```
import uuid
import boto3
from typing import Dict, List

def call_invoke_agent_api(prompt: str, chat_session_id: str):

    bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
    agent_response = bedrock_agent_runtime_client.invoke_agent(
        agentId=***,
        agentAliasId=***,
        inputText=prompt,
        sessionId=chat_session_id,
        enableTrace=True,
    )

    text_completions: List[Dict] = []
    for event in agent_response["completion"]:
        chunk = event.get("chunk")
        if chunk is None:
            continue

        attribution = chunk.get("attribution")
        if attribution is not None:
            # the agent used RAG to generate a response
            citations = attribution.get("citations")
            for citation in citations:
                text = citation["generatedResponsePart"]["textResponsePart"]["text"]

                uri = filename = ""
                retrieved_references = citation["retrievedReferences"]
                for reference in retrieved_references:
                    uri = reference["location"]["s3Location"]["uri"]
                    filename = uri.split("/")[-1]

                text_completion = dict(text=text)

                if uri:
                    text_completion["uri"] = uri
                    text_completion["filename"] = filename

                text_completions.append(text_completion)

        else:
            # the agent didn't use the knowledge base to generate a response, just get it from bytes
            text_completions.append(dict(text=chunk["bytes"].decode()))

    return dict(text_completions=text_completions)

chat_session_id = uuid.uuid4().hex
call_invoke_agent_api(prompt_1, chat_session_id)
call_invoke_agent_api(prompt_2, chat_session_id)
```

### Possible Solution

_No response_

### Additional Information/Context

It seems to always provide the `retrievedReferences` correctly for the first prompt that causes the Agent to use the Knowledge Base. After that it will sometimes return valid references in `retrievedReferences`, sometimes empty arrays, and sometimes not even an `attribution` field in the response, even though the trace for `invoke_agent` call shows that the Agent is performing RAG and using the Knowledge Base.

In my Agent's Advanced prompts configuration, I have the Pre-processing template and Post-processing template turned off

### SDK version used

boto3==1.34.61

### Environment details (OS name and version, etc.)

AmazonLinux2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

invoke_agent API on bedrock-agent-runtime inconsistently returns retrievedReferences #4197

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

SDK version used

Environment details (OS name and version, etc.)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

invoke_agent API on bedrock-agent-runtime inconsistently returns retrievedReferences #4197

Description

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

SDK version used

Environment details (OS name and version, etc.)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions