-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky TestAgentIdentification/ws #216
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #216 +/- ##
==========================================
- Coverage 76.11% 72.43% -3.69%
==========================================
Files 24 25 +1
Lines 1834 2017 +183
==========================================
+ Hits 1396 1461 +65
- Misses 326 446 +120
+ Partials 112 110 -2 ☔ View full report in Codecov by Sentry. |
I am trying to follow the case 1 timeline you mentioned.
2.b is triggered by the client sending the scheduled message (from 2.a step) so why does the first assertion fail? |
I try to draw a sequence diagram of case 1: sequenceDiagram
participant Client
participant Client.Receiver
participant Client.Sender
participant Server
Client->>Client.Sender: (1) c.SetAgentDescription in `prepareClient` call `sender.ScheduleSend()`
Client->>Server: (2) AgentToServer with original UID (via s.sendNextMessage in sender.Start)
alt
Server->>Server: store the original instance ID to rcvAgentInstanceUid
end
Client-->>Client.Sender: start the loop
Client-->>Client.Receiver: start the loop
Server->>Client.Receiver: (2.b)ServerToAgent with new UID
Client.Receiver->>Client.Sender: Update the pendding Message with new UID
Client.Sender->>Server: (2.a) send the pending scheduled message with new UID
alt Oops!
Server->>Server: store the new instance ID to rcvAgentInstanceUid
end
Server ->> Client.Receiver: reply ...
Client->>Client: run the first eventually assertion and fail
2.b will update the message on processing the ServerToAgent message. If this happends before sender sending the pending message, rcvAgentInstanceUid will be updated with the new UID. |
There are two times we can see when the ScheduleSend is invoked which sends the message to the Server. The first one is from the |
|
I am not sure I follow. What does this PR attempt to fix then? The first assertion is expected to always pass. If you look at the last failed runs in the issue, they point out Condition never satisfied at the first assertion. I think any fix should involve figuring out why it failed. |
The first assertion is used to check whether the server has received the original uid. In the timeline case 1, the server has received the original uid, the first assertion falied becase This PR replaces the atomic.Value with a buffered channel to record uids received at server side to avoid this case. |
How did you arrive at this conclusion? It could be because no value was set to |
If no value was set to |
https://github.com/haoqixu/opamp-go/tree/debug-pr-216 |
This is the Case 1. |
This is the Case 2. |
Were you able to easily reproduce the original issue? Can you share how did you reproduce? |
I can reproduce it by running |
Thanks, I will take a look. |
Sorry for the lack of response here for some time. I have been sick and swamped with a lot of work at $job. I will find time to get back to review PRs soon. |
Coming back to this point, this says the first assertion failed because id was overwritten by second |
Thanks, I went through the ws sender code again. I think the additional opamp-go/client/internal/wssender.go Lines 34 to 42 in ca0e1e6
ScheduleSend and we expect the SetAgentDescription must be called before Start which schedules the message. I believe this is why we don't see the same flaky behaviour in the http client.
|
Yes. But I don't think test code should treat the above cases as a failure. Even with this extra message, the client uses the original ID first and then sends subsequent messages with the new ID after receiving the reply as expected. I think we should update the test code so that the above two cases pass. I would like to submit anohter PR to avoid the additional message. |
The test is written with certain assumptions and it failed because of the additional message, if we fix that test would pass and we don't need to fix any tests. |
I don't think the extra message is incorrect (although it could definitely be improved). The observable behavior of the client is as expected, but our detection method cannot handle it and fail to produce the same outcome with each individual test run. |
Case 2 is not caused by the additional message and we still need to need the test. |
This test is flaky, it usually succeeds but occasionally fails. If we want to avoid extra messages, it is not enough to prove it, we still need a separate test. So I prefer that this flaky test code should be fixed first to make its results consistent. |
It has been a while and the test has been changed since it failed and does not appear to be failing any more, see here. Closing this PR, but we can reopen if we see failures again. |
Resolve #191
TestAgentIdentification executes the following steps to test the client:
prepareClient()
: setup settings and callSetAgentDescription
of client which schedules an message sending.client.Start()
:send the first message
(a) launch an goroutine to handle scheduled message sending.
(b) start the receiver loop to handle messages from server.
eventually
assert.client.SetAgentDescription()
to schedule an message sending.eventually
assert.scheduled message sending in (2.a) and messages handling (2.b) are running concurrently.
Case 1:
the timeline is as follows:
c.SetAgentDescription
to schedule an message sending.eventually
assert fails to get the original instance IDCase 2:
the timeline is as follows:
c.SetAgentDescription
to schedule an message sending.eventually
assert trys to get the original instance ID and will return successfully after server has stored the instance ID in the first message to rcvAgentInstanceUid.client.SetAgentDescription()
to schedule an message sendingeventually
assert failed to get the new instance uid.The solution here is replacing atomic.Value with a buffered channel to avoid case 1 and waiting for the first response from server to avoid case 2.