Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fact Checking Guardrails #416

Open
amri369 opened this issue Apr 2, 2025 · 9 comments
Open

Fact Checking Guardrails #416

amri369 opened this issue Apr 2, 2025 · 9 comments
Labels
enhancement New feature or request needs-more-info Waiting for a reply/more info from the author

Comments

@amri369
Copy link
Contributor

amri369 commented Apr 2, 2025

Please Read This First

  • Have you read the docs? → Yes
  • Have you searched for related issues? → Yes

Describe the Feature

The current implementation supports:

  • Input Guardrails: Uses user input.
  • Output Guardrails: Uses the last agent output.

For some projects, there's a need for fact-checking guardrails —similar to Nemo Guardrails. This approach would use:

  • Initial User Input
  • Last Agent Output

I’ve raised this PR to introduce the feature. Please let me know if this implementation meets your expectations or if you plan to address this in the future.

@amri369 amri369 added the enhancement New feature or request label Apr 2, 2025
@rm-openai
Copy link
Collaborator

Could you explain why we need a new concept for this versus using the existing input/output guardrails?

@rm-openai rm-openai added the needs-more-info Waiting for a reply/more info from the author label Apr 2, 2025
@amri369
Copy link
Contributor Author

amri369 commented Apr 2, 2025

@rm-openai: Fact checking can be used for instance to check if the output of the agent workflow is consistent with the initial input data (see this example).

Based on this documentation:

  • Input guardrails are intended to run on initial user input, so an agent's guardrails only run if the agent is the first agent.
  • Output guardrails are intended to run on the final agent output, so an agent's guardrails only run if the agent is the last agent.

From my understanding and testing, none of these concepts runs simultaneously on initial user input and last agent output. Please correct me if I'm wrong.

@rm-openai
Copy link
Collaborator

I guess I'm wondering why you couldn't just set both input + output, to achieve the same goal (instead of introducing a new concept)

@amri369 amri369 mentioned this issue Apr 2, 2025
4 tasks
@amri369
Copy link
Contributor Author

amri369 commented Apr 2, 2025

Many thanks @rm-openai . I explored this option using the output_guardrail decorator. However, it seems that the Runner implementation automatically triggers the output guardrails on the last agent output.

So either the runner needs to be updated, or a new concept needs to be introduced. Nemo Guardrails from Nvidia consider Fact Checking Guardrail as a separate concept, so I thought that's the way to go. Please correct me if I'm wrong.

@rm-openai
Copy link
Collaborator

@amri369 sorry still not totally sure what the difference is.

However, it seems that the Runner implementation automatically triggers the output guardrails on the last agent output.

isn't that what you want? for the fact checking impl to trigger on the last output?

@amri369
Copy link
Contributor Author

amri369 commented Apr 2, 2025

@rm-openai

Our objective described here, is to pass both the original user input and the last agent output to the output guardrails.

From my understanding, the Runner triggers the output guardrails using the following code:

output_guardrail_results = await cls._run_output_guardrails(
                            current_agent.output_guardrails + (run_config.output_guardrails or []),
                            current_agent,
                            turn_result.next_step.output,
                            context_wrapper,
                        )

However, when I printed all the arguments passed to _run_output_guardrails, I noticed that none of them contain the original user input.

Could you please advise on a solution that would allow us to pass the original user input along with the last agent output when invoking output guardrails?

Thanks for your guidance!

@amri369
Copy link
Contributor Author

amri369 commented Apr 8, 2025

@rm-openai : Did you have the chance to explore?

@rm-openai
Copy link
Collaborator

Hey sorry for the delay, missed the update. Makes sense that this doesn't quite work right now. Let me think about the best fix here. I'm still leaning towards not introducing a new concept, and instead making the history available to the output guardrail somehow.

@amri369
Copy link
Contributor Author

amri369 commented Apr 9, 2025

Many thanks @rm-openai. I'm looking forward to your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-more-info Waiting for a reply/more info from the author
Projects
None yet
Development

No branches or pull requests

2 participants