Skip to content

Conversation

CostantinoEsposito89
Copy link

The format_output method in QualityScorer previously parsed scores by splitting the LLM output by newlines and iterating. This approach was brittle and could fail if the LLM included reasoning, explanations, or extra newlines before the score list, causing valid scores to be missed.

This commit refactors the parsing logic to use re.findall. This robustly extracts all lines matching the score format from the entire output, regardless of their position.

Key improvements:

  • Resilience: Correctly parses scores even when they are preceded by reasoning or other text from the LLM.
  • Reliability: Ensures the scores list is correctly padded with None if fewer scores are found than expected, maintaining a consistent output shape.

CostantinoEsposito89 and others added 2 commits October 1, 2025 14:38
The `format_output` method in `QualityScorer` previously parsed scores by splitting the LLM output by newlines and iterating. This approach was brittle and could fail if the LLM included reasoning, explanations, or extra newlines before the score list, causing valid scores to be missed.

This commit refactors the parsing logic to use `re.findall`. This robustly extracts all lines matching the score format from the entire output, regardless of their position.

Key improvements:
- Resilience: Correctly parses scores even when they are preceded by reasoning or other text from the LLM.
- Reliability: Ensures the `scores` list is correctly padded with `None` if fewer scores are found than expected, maintaining a consistent output shape.
@CostantinoEsposito89 CostantinoEsposito89 changed the title fix(quality-scorer): Improve robustness of score bug: quality-scorer - Improve robustness of score Oct 1, 2025
Copy link

codspeed-hq bot commented Oct 4, 2025

CodSpeed Performance Report

Merging #1166 will degrade performances by 88.11%

Comparing CostantinoEsposito89:fix/quality-scorer/robust-score-parsing (b9389c8) with develop (0bec0a5)

🎉 Hooray! pytest-codspeed just leveled up to 4.0.0!

A heads-up, this is a breaking change and it might affect your current performance baseline a bit. But here's the exciting part - it's packed with new, cool features and promises improved result stability 🥳!
Curious about what's new? Visit our releases page to delve into all the awesome details about this new version.

Summary

❌ 1 regression

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_cache_time 673.5 ms 5,666.2 ms -88.11%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant