-
Notifications
You must be signed in to change notification settings - Fork 433
Changed Attempt.outputs to return all assistant outputs #1168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changed Attempt.outputs to return all assistant outputs #1168
Conversation
The Note that the I suspect this change is not quite what is needed however I need to do some more testing to be sure. |
Sounds like this PR needs to be held until |
General refactor is upcoming in #1089 which is planned for the coming fortnight |
@leondz park this then until refactor and potentially delete? Feels like it might be a redundant change. |
The Turn & Conversation functionality won't hit release until end May, possibly end June. Depending on review, if we can land this earlier than that, I'm happy |
@leondz sorry if I was unclear, the root cause of the issue seems to be in While |
@jmartin-tech let me know if any changes are needed here and happy to update the branch/PR. |
This highlighted an unclear behaviour in atkgen and whether use of all_outputs or outputs is preferred when doing detection on an attempt. Atkgen expects that the detector will be run over all outputs at any conversational turn. |
This PR fixes #1127.
Changed
attempts.Attempt.outputs
to return all assistant turns outputs, not just the final turn output.Added
last_output
property toattempts.Attempt
to return the last assistant turn output.probes.atkgen.Tox
anddetectors.unsafe_content.ToxicCommentModel
to validate that outputs and detector_results cardinalities match.Verification
python -m pytest tests/
python -m garak --model_type huggingface --model_name gpt2 --probes atkgen.Tox --detectors unsafe_content.ToxicCommentModel
The above CMD will write the probe attempts including detector results to the
garak*.report.jsonl
file and will contain entries where the"entry_type": "attempt"
entry will containoutputs
anddetector_results
with the same cardinality, should any detection results have been produced.