Changed Attempt.outputs to return all assistant outputs #1168

mrowebot · 2025-04-17T21:19:45Z

This PR fixes #1127.

Changed

attempts.Attempt.outputs to return all assistant turns outputs, not just the final turn output.

Added

last_output property to attempts.Attempt to return the last assistant turn output.
Unit test cases for probes.atkgen.Tox and detectors.unsafe_content.ToxicCommentModel to validate that outputs and detector_results cardinalities match.

Verification

Run the tests and ensure they pass python -m pytest tests/
python -m garak --model_type huggingface --model_name gpt2 --probes atkgen.Tox --detectors unsafe_content.ToxicCommentModel

The above CMD will write the probe attempts including detector results to the garak*.report.jsonl file and will contain entries where the "entry_type": "attempt" entry will contain outputs and detector_results with the same cardinality, should any detection results have been produced.

jmartin-tech · 2025-04-17T23:00:55Z

The attempt.outputs method is currently in flux as turn an conversation support is coming soon in main. Until a week ago detectors used attempt.all_outputs() and as of land for #943 with language translation options they all now rely on attempt.outputs_for(lang) to get the list of outputs a detector should perform evaluation on, these methods are currently what are expected to have a 1:1 match in to the list of detector_results[detector_name] however I do see that Evaluator.evaluate() relies on attempt.outputs() which likely indicates #1127 is a unique issue. The rework of turn/conversation will need likely incorporate testing to validate detector result count to output expectations are enforced.

Note that the atkgen probe's custom behavior and mutation of Attempt is likely the true root of this divergence as that probe emits Attempts that to not conform to normal detector expectations.

I suspect this change is not quite what is needed however I need to do some more testing to be sure.

mrowebot · 2025-04-18T06:25:14Z

Sounds like this PR needs to be held until attempts.output is clearer, as this could be an edge case involving atkgen

leondz · 2025-04-18T06:33:05Z

General refactor is upcoming in #1089 which is planned for the coming fortnight

mrowebot · 2025-04-18T22:35:31Z

@leondz park this then until refactor and potentially delete? Feels like it might be a redundant change.

leondz · 2025-04-19T06:19:36Z

The Turn & Conversation functionality won't hit release until end May, possibly end June. Depending on review, if we can land this earlier than that, I'm happy

jmartin-tech · 2025-04-19T18:44:10Z

@leondz sorry if I was unclear, the root cause of the issue seems to be in atkgen's custom mutation of it's attempts not matching the expectations of the detector the attempts are passed to.

While attempt.outputs() is only called in Evaluator.evaluate at this time, I do not believe a global changing in behavior of this method is the desired result as this impacts all evaluation processing. This PR is set as draft for now and I will take a closer look in the next few days at possible follow on impacts before shifting it back to ready for review state.

mrowebot · 2025-04-19T20:44:40Z

@jmartin-tech let me know if any changes are needed here and happy to update the branch/PR.

leondz · 2025-04-23T15:15:55Z

This highlighted an unclear behaviour in atkgen and whether use of all_outputs or outputs is preferred when doing detection on an attempt. Atkgen expects that the detector will be run over all outputs at any conversational turn.

mrowebot added 3 commits April 17, 2025 14:18

1127 Changed Attempt.outputs to return all assistant outputs

93d3b04

1127 Fixed test_attempt cases to include attempt.last_output check

d5e81c6

1127 Tidied up branch following linting

da2be18

mrowebot marked this pull request as ready for review April 17, 2025 22:13

mrowebot changed the title ~~1127 Changed Attempt.outputs to return all assistant outputs~~ #1127 Changed Attempt.outputs to return all assistant outputs Apr 17, 2025

mrowebot changed the title ~~#1127 Changed Attempt.outputs to return all assistant outputs~~ 1127 Changed Attempt.outputs to return all assistant outputs Apr 17, 2025

jmartin-tech changed the title ~~1127 Changed Attempt.outputs to return all assistant outputs~~ Changed Attempt.outputs to return all assistant outputs Apr 17, 2025

jmartin-tech marked this pull request as draft April 18, 2025 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed Attempt.outputs to return all assistant outputs #1168

Changed Attempt.outputs to return all assistant outputs #1168

mrowebot commented Apr 17, 2025 •

edited

Loading

jmartin-tech commented Apr 17, 2025 •

edited

Loading

mrowebot commented Apr 18, 2025

leondz commented Apr 18, 2025

mrowebot commented Apr 18, 2025

leondz commented Apr 19, 2025

jmartin-tech commented Apr 19, 2025

mrowebot commented Apr 19, 2025

leondz commented Apr 23, 2025

Changed Attempt.outputs to return all assistant outputs #1168

Are you sure you want to change the base?

Changed Attempt.outputs to return all assistant outputs #1168

Conversation

mrowebot commented Apr 17, 2025 • edited Loading

Changed

Added

Verification

jmartin-tech commented Apr 17, 2025 • edited Loading

mrowebot commented Apr 18, 2025

leondz commented Apr 18, 2025

mrowebot commented Apr 18, 2025

leondz commented Apr 19, 2025

jmartin-tech commented Apr 19, 2025

mrowebot commented Apr 19, 2025

leondz commented Apr 23, 2025

mrowebot commented Apr 17, 2025 •

edited

Loading

jmartin-tech commented Apr 17, 2025 •

edited

Loading