[LLM] Refactor LLMWithFeedbackCycle and RequestManager by DanielRendox · Pull Request #492 · JetBrains-Research/TestSpark

DanielRendox · 2025-05-02T14:41:42Z

Description of changes made

Remove Generic Error type for Result. Since we agreed to use TestSparkError as the parent class for all errors in the plugin, having a generic parameter in the Result class for the error field is redundant. Passing it in all places of the project where Result is used just adds more unnecessary boilerplate.
Migrated from HttpRequests (from IntelliJ platform sdk) to ktor client, which helped to introduce coroutines to the Request to LLM module, return streamed api data in the form of kotlin flow, enable detailed logging of each http request and generally make the api logic a lot more concise and readable.
Enabled streaming for all API providers (HuggingFace and Gemini did not support it). Currently this just adds a bit more UX by updating the progress gutter with the text "Generating test n", but with this capability we can also later make the tests shown immediately in the UI, even while still being generated.
Enabled support for llm properties like system message, temperature, etc.
Resolved HuggingFaceRequestManager has Llama-specific prompt instuction #289 and Add language-agnostic parsing of the backtick block with code snippet for HuggingFaceRequestManage #290 by redirecting to the specific hugging face model implementation (for now, to Llama only)
Moved managing the conversation history to ChatSessionManager making the RequestManager easily testable and mockable.
Removed frequent boilerplate checking for process cancellation in different components. Now we can rely on the coroutine cancellation mechanism. runBlockingWithIndicatorLifecycle now monitors indicator cancellation and stops the coroutine immediately in case of detected cancellation.
Refactor LLMWithFeedbackCycle making the code a lot more readable
Add .kotlin folder to .gitignore
Changed JUnitTestSuiteParserStrategy to remove all triple quotes in the LLM response. This handles the edge case when LLM response contains more than two triple quotes, which is for some reason, the case for Llama.

What is missing?

Even though after some manual testing everything seems to work, I think this is very important to cover LLMWithFeedbackCycle with unit tests now because I may have broken something.

I have checked that I am merging into correct branch

…voSuiteTest

… tests and remove unused fixtures

…imports

arksap2002 · 2025-05-16T13:27:22Z

Additional changings:

Reimplement "project" variables in tests

...lin/org/jetbrains/research/testspark/tools/llm/generation/gemini/GeminiRequestManagerTest.kt

vsantele

I had two errors when testing this PR, which I've commented on in the code.

I didn't review all the changes, just the part where I had issues.

core/src/main/kotlin/org/jetbrains/research/testspark/core/generation/llm/ChatSessionManager.kt

...src/main/kotlin/org/jetbrains/research/testspark/core/generation/llm/LLMWithFeedbackCycle.kt

…lectChunks function

…ycle

vsantele · 2025-05-26T14:02:54Z

Hi, I don't know if it's in the scope of this PR, but the testsAssembler is not clean between 2 cycles with the LLM.

To check that, you can add a breakpoint here

TestSpark/src/main/kotlin/org/jetbrains/research/testspark/tools/llm/generation/JUnitTestsAssembler.kt

Line 55 in 1f1179e

val testSuite = junitTestSuiteParser.parseTestSuite(super.getContent())

Generate tests on a piece of code that requires at least 2 cycles and check the content of rawText when the breakpoint is triggered the second time. You'll see in the middle a line like:

... code
``````java
...code

DanielRendox · 2025-05-27T08:31:41Z

@vsantele If you have time, would be nice if you fix it :) Please push directly to this branch

Added a call to `testsAssembler.clear()` to ensure it is empty at the start of each iteration in the feedback cycle. This prevents residual data from previous iterations from affecting later ones.

vsantele · 2025-05-27T09:04:33Z

@DanielRendox I made a fix to clear the content before each iteration. This was much easier and cleaner than trying to find the best spot at the end of the cycle (several early terminations)

It also fixed a bug that caused the number of tests generated to increase with each iteration without being reset to zero. Now, the counter increases up to the maximum of the first iteration. Then, it's potentially increased if the second iteration produces more tests. But, it will show a "false" number if the second iteration produces fewer tests.

I already have an easy solution for resetting the counter between each cycle (override the clear method in JUnitTestsAssembler). This way, the counter will be updated from zero at the beginning of each cycle.

vsantele · 2025-05-27T09:43:56Z

...src/main/kotlin/org/jetbrains/research/testspark/core/generation/llm/LLMWithFeedbackCycle.kt

-                            continue
-                        }
+                val testSuite = testSuiteResult.data
+                generatedTestSuites.add(testSuite)


generatedTestSuites is not cleared between two cycles. At the end of the process, all generated tests from every iteration are displayed.
In my opinion, we only need to keep compilable ones or only from the latest cycle.

The old logic used a set with all compilable tests.

stephanlukasczyk

I've at least a couple of comments, some are nitpicks or questions, feel free to ignore them. I hope I got the main changes, at least it worked when I tried it out, which gives me some confidence.

.../kotlin/org/jetbrains/research/testspark/core/generation/llm/network/model/LlmCommonModel.kt

core/src/main/kotlin/org/jetbrains/research/testspark/core/test/SupportedLanguage.kt

.../org/jetbrains/research/testspark/core/test/parsers/kotlin/KotlinJUnitTestSuiteParserTest.kt

vsantele · 2025-05-30T09:32:45Z

...src/main/kotlin/org/jetbrains/research/testspark/core/generation/llm/LLMWithFeedbackCycle.kt

+                         * The current attempt does not count as a failure since it was rejected due to the prompt size
+                         * exceeding the threshold
+                         */
+                        if (testSuiteResult.error is LlmError.PromptTooLong) iteration--


The new logic keeps the same kind of bug as #483
If the prompt is too long, the chatHistory is not cleared. So, the next request will send 2 user messages with the original and "reduced" prompts. The error will be the same.

DanielRendox added 11 commits May 2, 2025 13:30

Remove Generic Error type for Result

6aee4de

Introduce Coroutines to RequestManager

45f1eb3

Debug all models and make some adjustments

771551d

Migrate to kotlinx-serialization and make code more readable

d55b77e

Fix bug with ChatHistory

3a61215

Refactor LLMWithFeedbackCycle.kt

60cf966

Handle process cancellation gracefully

aab1dee

Fix minor issues

d1effc8

Add custom exception for test saving failure

0e0099f

Fix LLM bugs

4092589

Remove kotlinVersion from gradle.properties

93264f1

DanielRendox requested a review from pderakhshanfar May 2, 2025 14:41

DanielRendox and others added 4 commits May 2, 2025 17:02

Fix ktlint issues

744f71b

reimplement project for testing creation in the SettingsArgumentsLlmE…

bd10ece

…voSuiteTest

refactor: replace direct service calls with Mockito mocks in settings…

deb362d

… tests and remove unused fixtures

refactor: optimize imports in settings tests and remove redundant re-…

17bfc00

…imports

arksap2002 reviewed May 16, 2025

View reviewed changes

...lin/org/jetbrains/research/testspark/tools/llm/generation/gemini/GeminiRequestManagerTest.kt Outdated Show resolved Hide resolved

vsantele reviewed May 21, 2025

View reviewed changes

arksap2002 and others added 5 commits May 23, 2025 16:08

refactor: improve error handling and response assembling logic in col…

c8f5bf7

…lectChunks function

refactor: improve error handling and response assembling logic in col…

2824f3b

…lectChunks function

Repair GeminiRequestManagerTest after RequestManager class changes

2c794ce

Remove leftover generatedTestSuite lateinit var from LLMWithFeedbackC…

df59e10

…ycle

Fix ktlint issues

1f1179e

Clear testsAssembler before each feedback iteration

4611aa1

Added a call to `testsAssembler.clear()` to ensure it is empty at the start of each iteration in the feedback cycle. This prevents residual data from previous iterations from affecting later ones.

vsantele reviewed May 27, 2025

View reviewed changes

stephanlukasczyk requested changes May 28, 2025

View reviewed changes

Resolve code review comments

0b23670

vsantele reviewed May 30, 2025

View reviewed changes

DanielRendox requested a review from stephanlukasczyk June 6, 2025 13:49

stephanlukasczyk approved these changes Jun 6, 2025

View reviewed changes

Conversation

DanielRendox commented May 2, 2025

Description of changes made

What is missing?

Uh oh!

arksap2002 commented May 16, 2025

Uh oh!

Uh oh!

vsantele left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vsantele commented May 26, 2025

Uh oh!

DanielRendox commented May 27, 2025

Uh oh!

vsantele commented May 27, 2025

Uh oh!

vsantele May 27, 2025

Choose a reason for hiding this comment

Uh oh!

vsantele May 27, 2025

Choose a reason for hiding this comment

Uh oh!

stephanlukasczyk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vsantele May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants