Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEVCON-2524] output deploy queue to checkrun summary #772

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

tlin4194
Copy link

@tlin4194 tlin4194 commented Oct 28, 2024

Issue:

Previously, when a revision is queued, we mark the github status as queued but have no other indication in the check run itself that this check run is waiting on something.
Users have to figure out which check run is holding up the queue which is difficult because deploys for a root can happen off any branch. We can figure this out using the temporal UI however it would be a better CX if we could just surface the queue itself in the check run.

Part 1

Add queue info to checkrun summary on creating new check run or unlock signal. Queue info is shown as [commit SHA](link to specific check run)

Screenshot 2024-10-31 at 1 23 43 PM

Part 2

Check run summaries do not get updated in cases where a a Terraform workflow is pending Confirm/Reject checkrun action between Plan and Apply steps, since the deploy queue lock only tracks the "Unlock" checkrun action.

  • Surfaces deploy queue from queue package to deploy/terraform package. Some complication using Queue structs here due to cyclic dependencies between the two packages.
  • To avoid restructuring the two packages, instead we will pass the GithubCheckRunCache (github package) over from queue worker to deploy/terraform workflow.
  • StateReceiver will use GithubCheckRunCache to update checkrun summaries for each revision on the queue when a Confirm/Reject action is pending.
Screenshot 2024-11-04 at 12 46 37 PM

Test Plan

  1. Recreating Unlock action: open 2 PRs, run atlantis apply -f on PR 1, and try to merge PR 2. PR 2 will be locked by confirm reject action on PR 1. Any subsequent queued deploys will show up as queued commits (with corresponding links) in checkrun summary.
  2. Recreating diverged confirm/reject action: run atlantis apply -f on unmerged PR 1 and confirm. Create PR 3 and run atlantis apply -f on unmerged PR 3.
  3. Merged PR 1 deploy will update to to show pending action on PR 3. Links will go directly to the specific run instead of just the commit in case there were multiple forced applies on PR 1.

Improve Test Cases

Fix queue_test and updater_test where assertions in CreateOrUpdate were not getting caught since the function is called within a temporal worker.

@tlin4194
Copy link
Author

/ptal #dev-console

smonero
smonero previously approved these changes Oct 30, 2024
@tlin4194 tlin4194 force-pushed the DEVCON-2524-output-deploy-queue-to-checkrun-summary branch from eb843ba to cbf4fa5 Compare October 31, 2024 19:31
smonero
smonero previously approved these changes Nov 6, 2024
}

workflow.GetLogger(ctx).Debug(fmt.Sprintf("Updating action pending summary for deployment id: %s", i.ID.String()))
_, err := n.CheckRunCache.CreateOrUpdate(ctx, i.ID.String(), request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might pose some non-determinism errors, see https://community.temporal.io/t/workflow-determinism/4027, but I could be wrong. If it does, let's add some workflow versioning here

@tlin4194 tlin4194 changed the title Devcon 2524 output deploy queue to checkrun summary [DEVCON-2524] output deploy queue to checkrun summary Nov 8, 2024
@tlin4194 tlin4194 force-pushed the DEVCON-2524-output-deploy-queue-to-checkrun-summary branch from 5fbd36d to d5ef226 Compare November 9, 2024 01:27
@tlin4194
Copy link
Author

tlin4194 commented Nov 9, 2024

Current PR is tested on staging, with behavior as expected.
The only issue is that after a deploy pending on confirm/reject action gets unblocked, if the queue is currently locked the state receiver cannot restore the unlock button on the checkruns for queued items. The workaround in the UI is that users will just click on the linked previous checkrun unlock from there.

The actual solution requires exposing queue lock state to state receiver. This would require a significant code restructure since LockState cannot be imported to Terraform package without a cyclic dependency. The following steps are moved to a separate PR.
Steps:

  1. Add lockstate logic to state receiver
  2. Move lockstate to independent package so that lockstate can be imported in state receiver
  3. Split up queue interface into 2 interfaces due to interface bloat, one for deploy worker, one for terraform runner.
  4. Fix test cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants