Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container instances never end up running #18

Open
atrauzzi opened this issue Dec 6, 2017 · 13 comments
Open

Container instances never end up running #18

atrauzzi opened this issue Dec 6, 2017 · 13 comments
Assignees

Comments

@atrauzzi
Copy link

atrauzzi commented Dec 6, 2017

I'm trying out the ACI based cloud but am encountering an issue with both Windows and Linux based containers. While I do end up seeing that the plugin attempts to create instances, my tasks never seem to run. Instead, the instances linger for a while and then disappear, having done nothing.

Feedback on the containers both in the portal and via az seems sparse:

image

Log of container 'linux-w4gcn' in container group 'linux-w4gcn' is not available yet. Please check container 'InstanceView' for more info
rmation or retry later.

Looking at the Jenkins logs, I do see one exception per type if I leave things to run long enough:

WARNING: AciCloud: Provision agent linux-kbt38 failed: ACI container terminated
[pool-266-thread-1] INFO com.microsoft.aad.adal4j.AuthenticationAuthority - [Correlation ID: REDACTED] Instance discovery was successful
Dec 06, 2017 2:15:57 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Dec 06, 2017 2:15:57 PM hudson.slaves.NodeProvisioner$2 run
WARNING: Unexpected exception encountered while provisioning agent Linux
java.lang.Exception: java.lang.IllegalStateException: ACI container terminated
        at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:131)
        at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:80)
        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: ACI container terminated
        at com.microsoft.jenkins.containeragents.aci.AciCloud.waitToOnline(AciCloud.java:201)
        at com.microsoft.jenkins.containeragents.aci.AciCloud.access$200(AciCloud.java:42)
        at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:107)
        ... 6 more
WARNING: Unexpected exception encountered while provisioning agent NET Framework
java.lang.Exception: java.lang.Exception: Deployment timeout
        at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:131)
        at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:80)
        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Deployment timeout
        at com.microsoft.jenkins.containeragents.aci.AciService.createDeployment(AciService.java:122)
        at com.microsoft.jenkins.containeragents.aci.AciContainerTemplate.provisionAgents(AciContainerTemplate.java:107)
        at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:104)
        ... 6 more

Nothing fancy about my templates, just trying the defaults for now to make sure everything works:

image

image

@zackliu
Copy link
Contributor

zackliu commented Dec 7, 2017

you run az container show combined with az container logs to get more details.
The initialization failed will produce this problem, but Azure will have two different behaviors to initialization failed. One is deployment shows successful and container shows terminated, the other is endless deployment.

@zackliu zackliu self-assigned this Dec 8, 2017
@raphaelquati
Copy link

raphaelquati commented Jan 3, 2018

I'm tryiing to use my docker image as a JNLP agent (windows). Same problem.
But the problem is not the plugin. Is Azure itself

In Linux it is working.

@zackliu
Copy link
Contributor

zackliu commented Jan 4, 2018

Seems it's a connection issue, but I can't tell why connection timed out.
In the new version(0.4.0), we support SSH connection. You can have a try.

@jamesabbottsmith
Copy link

I've tried this with an SSH connection. I see similar behaviour. The container is started and appears to run, but exits before doing anything useful.

$ az container logs --name ...
Linux caas-82e05980240a429795153b976151f58a-2347572569-kqlfb 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC 2017 x86_64 GNU/Linux

I also see the ALPN callback dropped message in the logs

ALPN callback dropped: SPDY and HTTP/2 are disabled. Is alpn-boot on the boot class path?

@zackliu
Copy link
Contributor

zackliu commented Jan 16, 2018

@jamesabbottsmith Can you provide more logs about SSHLauncher in Jenkins logs.

Such as:

Jan 16, 2018 6:16:32 AM INFO com.microsoft.jenkins.containeragents.remote.SSHLauncher launch
SSHLauncher: Start to connect node acissh-g0w3t : 52.224.69.226 via SSH
Jan 16, 2018 6:16:44 AM INFO com.microsoft.jenkins.containeragents.remote.SSHLauncher launch
SSHLauncher: Copy slave.jar to remote host successfully

@atrauzzi
Copy link
Author

I've noticed a lot of issues with ACI even in just manually creating containers. It seems like the service has a lot of performance and consistency issues. Definitely to the extent that a plugin like this wouldn't make sense until the start time of any Windows or Linux container can be brought to under ~15 seconds.

@arjanschaaf
Copy link

I'm trying to setup an Jenkins deployment using the Solution Template together with an ACI based build environment. I'm running into very similar problems with the this setup and it is impossible to debug. az container show and az container logs are impossible to use because the containers are gone before you know it.

jenkins.log shows many entries like these:

Jan 26, 2018 1:01:10 PM com.microsoft.jenkins.containeragents.aci.AciCloud provision
INFO: Start ACI container for label ACI-container workLoad 1
Jan 26, 2018 1:01:10 PM com.microsoft.jenkins.containeragents.aci.AciCloud provision
INFO: Using ACI Container template: ACI-container
Jan 26, 2018 1:01:10 PM hudson.slaves.NodeProvisioner$StandardStrategyImpl apply
INFO: Started provisioning ACI-container from AciAgents with 1 executors. Remaining excess workload: 0
Jan 26, 2018 1:01:10 PM com.microsoft.jenkins.containeragents.aci.AciCloud$1 call
INFO: Add ACI node: aci-container-8q2gl
Jan 26, 2018 1:01:10 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Jan 26, 2018 1:01:12 PM com.microsoft.jenkins.containeragents.aci.AciService createDeployment
INFO: Waiting for deployment aci-container-q2m0kn9b
Jan 26, 2018 1:01:22 PM com.microsoft.jenkins.containeragents.aci.AciCloud$1 call
WARNING: AciCloud: Provision agent aci-container-8q2gl failed: Deployment aci-container-q2m0kn9b status: Failed
Jan 26, 2018 1:01:22 PM okhttp3.internal.platform.Platform log
INFO: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
Jan 26, 2018 1:01:22 PM com.microsoft.jenkins.containeragents.aci.AciService deleteAciContainerGroup
INFO: Delete ACI Container Group: aci-container-8q2gl successfully
Jan 26, 2018 1:01:30 PM hudson.slaves.NodeProvisioner$2 run
WARNING: Unexpected exception encountered while provisioning agent ACI-container
java.lang.Exception: java.lang.Exception: Deployment aci-container-q2m0kn9b status: Failed
	at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:136)
	at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:80)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Deployment aci-container-q2m0kn9b status: Failed
	at com.microsoft.jenkins.containeragents.aci.AciService.createDeployment(AciService.java:137)
	at com.microsoft.jenkins.containeragents.aci.AciContainerTemplate.provisionAgents(AciContainerTemplate.java:128)
	at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:104)
	... 6 more

@arjanschaaf
Copy link

So I'm finding my way around the Azure Portal and I finally found some relevant error feedback in the Deployments view of my ResourceGroup. Turned out I needed to issue this command and then it started working!!
az provider register --namespace Microsoft.ContainerInstance

@chenkennt
Copy link
Contributor

@atrauzzi We recommend to use SSH to launch container as it's much more stable than JNLP. Please have a try and let us know if there is any issue.

@atrauzzi
Copy link
Author

I don't have SSH access to launch nodes on certain services. So this is still an issue.

Why did you close this issue before I could reply?

@zackliu
Copy link
Contributor

zackliu commented Mar 19, 2018

@atrauzzi Please provide more logs and configurations about using SSH to launch ACI as you haven't succeeded. From our test, SSH is more stable than JNLP. Please keep in contact or we don't know what going on now.

@zackliu zackliu reopened this Mar 19, 2018
@ravee-rachakonda-by
Copy link

I am facing similar issue with jenkins/ssh-slave setup with this plugin. I see the agent gets deployed in ACI but gets deleted and another instance keeps coming up.

2020-03-29T17:18:15.716749905Z: [ERROR] 2020-03-29 17:18:15.716+0000 [id=112036] INFO c.m.j.c.aci.AciService#createDeployment: Deployment ztest-r3tkt376 succeed
2020-03-29T17:18:15.971375792Z: [ERROR] 2020-03-29 17:18:15.971+0000 [id=112036] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-03-29T17:18:16.119031025Z: [ERROR] 2020-03-29 17:18:16.114+0000 [id=112209] INFO c.m.j.c.remote.SSHLauncher#launch: SSHLauncher: Start to connect node ztest-kctsj : 52.152.198.206 via SSH

2020-03-29T17:18:16.422975930Z: [ERROR] 2020-03-29 17:18:16.422+0000 [id=112209] INFO c.m.j.c.remote.SSHLauncher#launch: SSHLauncher: Copy slave.jar to remote host successfully
2020-03-29T17:18:16.605629524Z: [ERROR] 2020-03-29 17:18:16.605+0000 [id=112209] INFO c.m.j.c.remote.SSHLauncher#launch: SSHLauncher: launching agent failed
2020-03-29T17:18:16.606118343Z: [ERROR] 2020-03-29 17:18:16.605+0000 [id=112036] WARNING c.m.j.c.aci.AciCloud$1#call: AciCloud: Provision agent ztest-kctsj failed: java.lang.InterruptedException: java.io.EOFException: unexpected stream termination
2020-03-29T17:18:17.050554205Z: [ERROR] 2020-03-29 17:18:17.050+0000 [id=112036] INFO o.j.p.cloudstats.CloudStatistics#getIdFor: No support for cloud-stats-plugin by class com.microsoft.jenkins.containeragents.aci.AciAgent

2020-03-29T17:18:17.241691129Z: [ERROR] 2020-03-29 17:18:17.241+0000 [id=112215] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?
2020-03-29T17:18:18.162858608Z: [ERROR] 2020-03-29 17:18:18.161+0000 [id=112209] INFO c.m.j.c.aci.AciService#deleteAciContainerGroup: Delete ACI Container Group: ztest-kctsj successfully

2020-03-29T17:18:24.527343702Z: [ERROR] 2020-03-29 17:18:24.526+0000 [id=88] WARNING hudson.slaves.NodeProvisioner#lambda$update$6: Unexpected exception encountered while provisioning agent ztest
2020-03-29T17:18:24.527390804Z: [ERROR] java.lang.InterruptedException: java.io.EOFException: unexpected stream termination
2020-03-29T17:18:24.527398804Z: [ERROR] at com.microsoft.jenkins.containeragents.remote.SSHLauncher.launch(SSHLauncher.java:121)
2020-03-29T17:18:24.527403904Z: [ERROR] at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:292)
2020-03-29T17:18:24.527409204Z: [ERROR] Caused: java.util.concurrent.ExecutionException
2020-03-29T17:18:24.527414405Z: [ERROR] at java.util.concurrent.FutureTask.report(FutureTask.java:122)
2020-03-29T17:18:24.527419405Z: [ERROR] at java.util.concurrent.FutureTask.get(FutureTask.java:192)
2020-03-29T17:18:24.527424205Z: [ERROR] at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:115)
2020-03-29T17:18:24.527429005Z: [ERROR] Caused: java.lang.Exception
2020-03-29T17:18:24.527433705Z: [ERROR] at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:140)
2020-03-29T17:18:24.527438606Z: [ERROR] at com.microsoft.jenkins.containeragents.aci.AciCloud$1.call(AciCloud.java:80)
2020-03-29T17:18:24.527443306Z: [ERROR] at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
2020-03-29T17:18:24.527448206Z: [ERROR] at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
2020-03-29T17:18:24.527452906Z: [ERROR] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2020-03-29T17:18:24.527457606Z: [ERROR] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2020-03-29T17:18:24.527462407Z: [ERROR] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2020-03-29T17:18:24.527467207Z: [ERROR] at java.lang.Thread.run(Thread.java:748)
2020-03-29T17:18:24.527471907Z: [ERROR] 2020-03-29 17:18:24.527+0000 [id=88] WARNING c.m.j.c.aci.AciCloud#canProvision: Cannot provision: template for label ztest is not available now, because it failed to provision last time.

2020-03-29T17:18:28.893307318Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$0: Started DockerContainerWatchdog Asynchronous Periodic Work
2020-03-29T17:18:28.893558628Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog has been triggered
2020-03-29T17:18:28.893581829Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog$Statistics#writeStatisticsToLog: Watchdog Statistics: Number of overall executions: 571, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 0, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 0 ms, Average runtime of container retrieval: 0 ms
2020-03-29T17:18:28.893631631Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 2 nodes assigned to this Jenkins instance, which we will check
2020-03-29T17:18:28.893644931Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
2020-03-29T17:18:28.893813838Z: [ERROR] 2020-03-29 17:18:28.893+0000 [id=112245] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$0: Finished DockerContainerWatchdog Asynchronous Periodic Work. 0 ms

2020-03-29T17:18:34.526268101Z: [ERROR] 2020-03-29 17:18:34.525+0000 [id=96] INFO c.m.j.c.aci.AciCloud#provision: Start ACI container for label ztest workLoad 1
2020-03-29T17:18:34.526304302Z: [ERROR] 2020-03-29 17:18:34.526+0000 [id=96] INFO c.m.j.c.aci.AciCloud#provision: Using ACI Container template: ztest
2020-03-29T17:18:34.526324603Z: [ERROR] 2020-03-29 17:18:34.526+0000 [id=96] INFO h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning ztest from SaaSBld with 1 executors. Remaining excess workload: 0
2020-03-29T17:18:34.526693417Z: [ERROR] 2020-03-29 17:18:34.526+0000 [id=112036] INFO c.m.j.c.aci.AciCloud$1#call: Add ACI node: ztest-3qs59
2020-03-29T17:18:34.538040658Z: [ERROR] 2020-03-29 17:18:34.537+0000 [id=112036] INFO c.m.j.c.a.AciCleanTask$DeploymentRegistrar#registerDeployment: AzureAciCleanUpTask: registerDeployment: Registering deployment ztest-6rmxcrfl in saas-*****-01
2020-03-29T17:18:34.853962320Z: [ERROR] 2020-03-29 17:18:34.853+0000 [id=112036] INFO o.internal.platform.Platform#log: ALPN callback dropped: HTTP/2 is disabled. Is alpn-boot on the boot class path?

2020-03-29T17:18:35.578729453Z: [ERROR] 2020-03-29 17:18:35.578+0000 [id=112036] INFO c.m.j.c.aci.AciService#createDeployment: Waiting for deployment ztest-6rmxcrfl

@ravee-rachakonda-by
Copy link

ravee-rachakonda-by commented Mar 29, 2020

Logs from the container before it dies out

TX1-XX:aci$ az container logs --name ztest-23n25 --resource-group XX-01 --subscription XX

  • [[ ssh-rsa XXX ]]
  • write_key 'ssh-rsa XXX'
  • local ID_GROUP
    ++ stat -c %U:%G /home/jenkins
  • ID_GROUP=jenkins:jenkins
  • mkdir -p /home/jenkins/.ssh
  • echo 'ssh-rsa XXXXX'
  • chown -Rf jenkins:jenkins /home/jenkins/.ssh
  • chmod 0700 -R /home/jenkins/.ssh
  • [[ 0 -gt 0 ]]
  • grep _
  • env
  • ssh-keygen -A
    ssh-keygen: generating new host keys: DSA
  • exec /usr/sbin/sshd -D -e
    Server listening on 0.0.0.0 port 22.
    Server listening on :: port 22.
    Did not receive identification string from 10.240.255.55 port 53589
    Did not receive identification string from 10.240.255.56 port 61040
    Did not receive identification string from 10.240.255.55 port 53924
    Did not receive identification string from 10.240.255.56 port 61365
    Did not receive identification string from 10.240.255.55 port 54129
    Did not receive identification string from 10.240.255.56 port 61557

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants