Skip to content

[LLM] Refactor LLMWithFeedbackCycle and RequestManager#492

Open
DanielRendox wants to merge 22 commits intodevelopmentfrom
danielrendox/refactor/request_manager
Open

[LLM] Refactor LLMWithFeedbackCycle and RequestManager#492
DanielRendox wants to merge 22 commits intodevelopmentfrom
danielrendox/refactor/request_manager

Conversation

@DanielRendox
Copy link
Collaborator

Description of changes made

  • Remove Generic Error type for Result. Since we agreed to use TestSparkError as the parent class for all errors in the plugin, having a generic parameter in the Result class for the error field is redundant. Passing it in all places of the project where Result is used just adds more unnecessary boilerplate.
  • Migrated from HttpRequests (from IntelliJ platform sdk) to ktor client, which helped to introduce coroutines to the Request to LLM module, return streamed api data in the form of kotlin flow, enable detailed logging of each http request and generally make the api logic a lot more concise and readable.
  • Enabled streaming for all API providers (HuggingFace and Gemini did not support it). Currently this just adds a bit more UX by updating the progress gutter with the text "Generating test n", but with this capability we can also later make the tests shown immediately in the UI, even while still being generated.
  • Enabled support for llm properties like system message, temperature, etc.
  • Resolved HuggingFaceRequestManager has Llama-specific prompt instuction #289 and Add language-agnostic parsing of the backtick block with code snippet for HuggingFaceRequestManage #290 by redirecting to the specific hugging face model implementation (for now, to Llama only)
  • Moved managing the conversation history to ChatSessionManager making the RequestManager easily testable and mockable.
  • Removed frequent boilerplate checking for process cancellation in different components. Now we can rely on the coroutine cancellation mechanism. runBlockingWithIndicatorLifecycle now monitors indicator cancellation and stops the coroutine immediately in case of detected cancellation.
  • Refactor LLMWithFeedbackCycle making the code a lot more readable
  • Add .kotlin folder to .gitignore
  • Changed JUnitTestSuiteParserStrategy to remove all triple quotes in the LLM response. This handles the edge case when LLM response contains more than two triple quotes, which is for some reason, the case for Llama.

What is missing?

Even though after some manual testing everything seems to work, I think this is very important to cover LLMWithFeedbackCycle with unit tests now because I may have broken something.

  • I have checked that I am merging into correct branch

@arksap2002
Copy link
Collaborator

Additional changings:

  • Reimplement "project" variables in tests

Copy link
Contributor

@vsantele vsantele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had two errors when testing this PR, which I've commented on in the code.

I didn't review all the changes, just the part where I had issues.

@vsantele
Copy link
Contributor

Hi, I don't know if it's in the scope of this PR, but the testsAssembler is not clean between 2 cycles with the LLM.

To check that, you can add a breakpoint here

val testSuite = junitTestSuiteParser.parseTestSuite(super.getContent())

Generate tests on a piece of code that requires at least 2 cycles and check the content of rawText when the breakpoint is triggered the second time. You'll see in the middle a line like:

... code
``````java
...code

@DanielRendox
Copy link
Collaborator Author

@vsantele If you have time, would be nice if you fix it :) Please push directly to this branch

Added a call to `testsAssembler.clear()` to ensure it is empty at the start of each iteration in the feedback cycle. This prevents residual data from previous iterations from affecting later ones.
@vsantele
Copy link
Contributor

@DanielRendox I made a fix to clear the content before each iteration. This was much easier and cleaner than trying to find the best spot at the end of the cycle (several early terminations)

It also fixed a bug that caused the number of tests generated to increase with each iteration without being reset to zero. Now, the counter increases up to the maximum of the first iteration. Then, it's potentially increased if the second iteration produces more tests. But, it will show a "false" number if the second iteration produces fewer tests.

I already have an easy solution for resetting the counter between each cycle (override the clear method in JUnitTestsAssembler). This way, the counter will be updated from zero at the beginning of each cycle.

continue
}
val testSuite = testSuiteResult.data
generatedTestSuites.add(testSuite)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generatedTestSuites is not cleared between two cycles. At the end of the process, all generated tests from every iteration are displayed.
In my opinion, we only need to keep compilable ones or only from the latest cycle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old logic used a set with all compilable tests.

Copy link
Collaborator

@stephanlukasczyk stephanlukasczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've at least a couple of comments, some are nitpicks or questions, feel free to ignore them. I hope I got the main changes, at least it worked when I tried it out, which gives me some confidence.

* The current attempt does not count as a failure since it was rejected due to the prompt size
* exceeding the threshold
*/
if (testSuiteResult.error is LlmError.PromptTooLong) iteration--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic keeps the same kind of bug as #483
If the prompt is too long, the chatHistory is not cleared. So, the next request will send 2 user messages with the original and "reduced" prompts. The error will be the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HuggingFaceRequestManager has Llama-specific prompt instuction

4 participants