Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent Not Connecting in UI #481

Open
sshleifer opened this issue Jan 23, 2025 · 3 comments
Open

Agent Not Connecting in UI #481

sshleifer opened this issue Jan 23, 2025 · 3 comments

Comments

@sshleifer
Copy link

sshleifer commented Jan 23, 2025

I followed README instructions with some modification.

My pod logs


2025-01-23T19:22:55.677Z        INFO    controller/controller.go:279    configuration loaded    {"config": {"agent-token-secret": "buildkite-cw1-secrets", "debug": false, "image": "ghcr.io/buildkite/agent:3.90.0", "job-ttl": "10m0s", "poll-interval": "1s", "stale-job-data-timeout": "10s", "job-creation-concurrency": 5, "max-in-flight": 25, "namespace": "buildkite", "org": "iterationlab", "tags": ["queue=kubernetes"], "profiler-address": "", "prometheus-port": 0, "cluster-uuid": "REDACTED", "prohibit-kubernetes-plugin": false, "additional-redacted-vars": [], "pod-spec-patch": null, "image-pull-backoff-grace-period": "30s", "job-cancel-checker-poll-interval": "5s", "agent-config": null, "default-checkout-params": null, "default-command-params": null, "default-sidecar-params": null, "default-metadata": {"Annotations":null,"Labels":null}, "default-image-pull-policy": "IfNotPresent", "default-image-check-pull-policy": ""}}
2025-01-23T19:22:55.783Z        INFO    monitor monitor/monitor.go:151  started {"org": "iterationlab"}

Which looks fine but in the UI it doesn't show any connected agents and when I trigger a build it says waiting for agent forever.

README modifications

We use argocd so I made a new app like

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: buildkite
  namespace: argocd
spec:
  goTemplate: true
  generators:
    - git:
        repoURL: "https://github.com/iterationlab/argocd.git"
        revision: HEAD
        files:
          - path: "clusters/*.json"
      selector:
        matchExpressions:
          - { key: buildkite_enabled, operator: In, values: ["true"] }
  template:
    metadata:
      name: "buildkite-{{.name}}"
    spec:
      destination:
        name: "{{.name}}"
        namespace: buildkite
      project: default
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true
      source:
        repoURL: https://github.com/buildkite/agent-stack-k8s.git
        targetRevision: HEAD
        path: charts/agent-stack-k8s
        helm:
          values: |
            config:
              org: iterationlab
              cluster-uuid: REDACTED
            image: "ghcr.io/buildkite/agent-stack-k8s/controller:latest"
            agentToken: "REDACTED"
            graphqlToken: "REDACTED"

Wondering if any ideas whats going wrong or debug steps? When I change the tokens I get louder errors so I don't think thats the problem.

@sshleifer sshleifer changed the title Agent Not Registering in UI Agent Not Connecting in UI Jan 23, 2025
@sshleifer
Copy link
Author

I also tried

helm upgrade --install agent-stack-k8s oci://ghcr.io/buildkite/helm/agent-stack-k8s \
    --create-namespace \
    --namespace buildkite \
    --set config.org=REDACTED \
    --set agentToken=REDACTED \
    --set graphqlToken=REDACTED

and have very similar pod logs and behavior.

@sshleifer
Copy link
Author

Image

@artem-zinnatullin
Copy link
Contributor

Hey @sshleifer I'm just a fellow user, not maintainer, but want to help:

  1. Enable debug logs on controller (notice "debug": false in your log) by setting
config: 
  debug: true
  1. Agents won't show up on Buildkite until Controller actually runs a Job, it bakes an agent container into K8S Job and then it will show up

  2. I think the issue might be that by default Controller sets config.queue to queue=kubernetes as shown in your log and if your Buildkite pipeline doesn't produce jobs with this tag Controller simply skips them, it's the same as if you ran Buildkite agent to only look for queue=kubernetes tag.

You can try to override config:

config:
  # I'm not sure if null is valid to take all jobs, in production you'd want Kubernetes job to only be picked up by certain tag
  tags: null

Or make a Buildkite pipeline that sets the matching tag as README suggests https://github.com/buildkite/agent-stack-k8s?tab=readme-ov-file#sample-buildkite-pipelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants