Fact Checking Guardrails #416

amri369 · 2025-04-02T07:04:43Z

Please Read This First

Have you read the docs? → Yes
Have you searched for related issues? → Yes

Describe the Feature

The current implementation supports:

Input Guardrails: Uses user input.
Output Guardrails: Uses the last agent output.

For some projects, there's a need for fact-checking guardrails —similar to Nemo Guardrails. This approach would use:

Initial User Input
Last Agent Output

I’ve raised this PR to introduce the feature. Please let me know if this implementation meets your expectations or if you plan to address this in the future.

rm-openai · 2025-04-02T15:11:16Z

Could you explain why we need a new concept for this versus using the existing input/output guardrails?

amri369 · 2025-04-02T15:27:59Z

@rm-openai: Fact checking can be used for instance to check if the output of the agent workflow is consistent with the initial input data (see this example).

Based on this documentation:

Input guardrails are intended to run on initial user input, so an agent's guardrails only run if the agent is the first agent.
Output guardrails are intended to run on the final agent output, so an agent's guardrails only run if the agent is the last agent.

From my understanding and testing, none of these concepts runs simultaneously on initial user input and last agent output. Please correct me if I'm wrong.

rm-openai · 2025-04-02T15:38:46Z

I guess I'm wondering why you couldn't just set both input + output, to achieve the same goal (instead of introducing a new concept)

amri369 · 2025-04-02T15:49:46Z

Many thanks @rm-openai . I explored this option using the output_guardrail decorator. However, it seems that the Runner implementation automatically triggers the output guardrails on the last agent output.

So either the runner needs to be updated, or a new concept needs to be introduced. Nemo Guardrails from Nvidia consider Fact Checking Guardrail as a separate concept, so I thought that's the way to go. Please correct me if I'm wrong.

rm-openai · 2025-04-02T16:12:43Z

@amri369 sorry still not totally sure what the difference is.

However, it seems that the Runner implementation automatically triggers the output guardrails on the last agent output.

isn't that what you want? for the fact checking impl to trigger on the last output?

amri369 · 2025-04-02T16:40:01Z

@rm-openai

Our objective described here, is to pass both the original user input and the last agent output to the output guardrails.

From my understanding, the Runner triggers the output guardrails using the following code:

output_guardrail_results = await cls._run_output_guardrails(
                            current_agent.output_guardrails + (run_config.output_guardrails or []),
                            current_agent,
                            turn_result.next_step.output,
                            context_wrapper,
                        )

However, when I printed all the arguments passed to _run_output_guardrails, I noticed that none of them contain the original user input.

Could you please advise on a solution that would allow us to pass the original user input along with the last agent output when invoking output guardrails?

Thanks for your guidance!

amri369 · 2025-04-08T05:52:10Z

@rm-openai : Did you have the chance to explore?

rm-openai · 2025-04-08T15:22:53Z

Hey sorry for the delay, missed the update. Makes sense that this doesn't quite work right now. Let me think about the best fix here. I'm still leaning towards not introducing a new concept, and instead making the history available to the output guardrail somehow.

amri369 · 2025-04-09T06:06:51Z

Many thanks @rm-openai. I'm looking forward to your solution.

amri369 added the enhancement New feature or request label Apr 2, 2025

rm-openai added the needs-more-info Waiting for a reply/more info from the author label Apr 2, 2025

amri369 mentioned this issue Apr 2, 2025

Fact checking guardrails #347

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fact Checking Guardrails #416

Fact Checking Guardrails #416

amri369 commented Apr 2, 2025 •

edited

Loading

rm-openai commented Apr 2, 2025

amri369 commented Apr 2, 2025

rm-openai commented Apr 2, 2025

amri369 commented Apr 2, 2025

rm-openai commented Apr 2, 2025

amri369 commented Apr 2, 2025

amri369 commented Apr 8, 2025

rm-openai commented Apr 8, 2025

amri369 commented Apr 9, 2025

Fact Checking Guardrails #416

Fact Checking Guardrails #416

Comments

amri369 commented Apr 2, 2025 • edited Loading

Please Read This First

Describe the Feature

rm-openai commented Apr 2, 2025

amri369 commented Apr 2, 2025

rm-openai commented Apr 2, 2025

amri369 commented Apr 2, 2025

rm-openai commented Apr 2, 2025

amri369 commented Apr 2, 2025

amri369 commented Apr 8, 2025

rm-openai commented Apr 8, 2025

amri369 commented Apr 9, 2025

amri369 commented Apr 2, 2025 •

edited

Loading