bug: quality-scorer - Improve robustness of score #1166

CostantinoEsposito89 · 2025-10-01T13:18:49Z

The format_output method in QualityScorer previously parsed scores by splitting the LLM output by newlines and iterating. This approach was brittle and could fail if the LLM included reasoning, explanations, or extra newlines before the score list, causing valid scores to be missed.

This commit refactors the parsing logic to use re.findall. This robustly extracts all lines matching the score format from the entire output, regardless of their position.

Key improvements:

Resilience: Correctly parses scores even when they are preceded by reasoning or other text from the LLM.
Reliability: Ensures the scores list is correctly padded with None if fewer scores are found than expected, maintaining a consistent output shape.

The `format_output` method in `QualityScorer` previously parsed scores by splitting the LLM output by newlines and iterating. This approach was brittle and could fail if the LLM included reasoning, explanations, or extra newlines before the score list, causing valid scores to be missed. This commit refactors the parsing logic to use `re.findall`. This robustly extracts all lines matching the score format from the entire output, regardless of their position. Key improvements: - Resilience: Correctly parses scores even when they are preceded by reasoning or other text from the LLM. - Reliability: Ensures the `scores` list is correctly padded with `None` if fewer scores are found than expected, maintaining a consistent output shape.

for more information, see https://pre-commit.ci

codspeed-hq · 2025-10-04T16:14:25Z

CodSpeed Performance Report

Merging #1166 will degrade performances by 88.11%

_{Comparing CostantinoEsposito89:fix/quality-scorer/robust-score-parsing (b9389c8) with develop (0bec0a5)}

🎉 Hooray! `pytest-codspeed` just leveled up to 4.0.0!

A heads-up, this is a breaking change and it might affect your current performance baseline a bit. But here's the exciting part - it's packed with new, cool features and promises improved result stability 🥳!
Curious about what's new? Visit our releases page to delve into all the awesome details about this new version.

Summary

❌ 1 regression

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_cache_time`	673.5 ms	5,666.2 ms	-88.11%

CostantinoEsposito89 and others added 2 commits October 1, 2025 14:38

[pre-commit.ci] auto fixes from pre-commit.com hooks

b9389c8

for more information, see https://pre-commit.ci

CostantinoEsposito89 changed the title ~~fix(quality-scorer): Improve robustness of score~~ bug: quality-scorer - Improve robustness of score Oct 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: quality-scorer - Improve robustness of score #1166

bug: quality-scorer - Improve robustness of score #1166

Uh oh!

CostantinoEsposito89 commented Oct 1, 2025

Uh oh!

codspeed-hq bot commented Oct 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bug: quality-scorer - Improve robustness of score #1166

Are you sure you want to change the base?

bug: quality-scorer - Improve robustness of score #1166

Uh oh!

Conversation

CostantinoEsposito89 commented Oct 1, 2025

Uh oh!

codspeed-hq bot commented Oct 4, 2025

CodSpeed Performance Report

Merging #1166 will degrade performances by 88.11%

🎉 Hooray! pytest-codspeed just leveled up to 4.0.0!

Summary

Benchmarks breakdown

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🎉 Hooray! `pytest-codspeed` just leveled up to 4.0.0!