Skip to content

Commit ec49265

Browse files
Debug logs for port bindings when Docker deploy image fails (#761)
This is spawning from `bind: address already in use` failures we were seeing in CI and will hopefully give a better picture of what's happening. Example `bind: address already in use` error: ``` Deploy: Deploy returned error Deploy: Failed to deploy image {Containers:-1 Created:1738100303 ID:sha256:4d01dfad2d62669c5784270111944528031170173ddc0b66d96f58ae700812d5 Labels:map[complement_blueprint:5_servers complement_context:sidecar_tests.5_servers.hs4 complement_hs_name:hs4 complement_pkg:sidecar_tests gitsha1:68e27e962c946074d6f9e18e98e55b3fe4f4c128 org.opencontainers.image.documentation:https://github.com/element-hq/synapse/blob/master/docker/README.md org.opencontainers.image.licenses:AGPL-3.0-or-later org.opencontainers.image.source:https://github.com/element-hq/synapse.git org.opencontainers.image.url:https://matrix.org/docs/projects/server/synapse] ParentID:sha256:2ae79e697d44ba986e3369e5a5197328e7b9638fa20102c9ca259e885ae71eea RepoDigests:[] RepoTags:[localhost/complement:sidecar_tests.5_servers.hs4] SharedSize:-1 Size:1400625929 VirtualSize:0} : Error response from daemon: driver failed programming external connectivity on endpoint complement_sidecar_tests_3_sideca r_tests.5_servers.hs4_2 (fc1bbb210acbcb5c13e64dc1336b36e914f4c8a5069db8d0ded1e12f0f26f10f): Error starting userland proxy: listen tcp4 127.0.0.1:33090: bind: address already in use ``` ### Example logs that this PR will output ```sh $ COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestFederationRoomsInvite ... ============== While deploying complement_fed_1_fed.2_servers.hs2_2 : START ALL COMPLEMENT DOCKER PORT BINDINGS ============== Container: 6c4665d6a12f063f0a0c0707fe826ac7bb1ef7642d9cc2b8189206c28aa30c12: [/complement_fed_1_fed.2_servers.hs1_1] (host) -> (container) 0.0.0.0:33494 -> 8009/tcp 0.0.0.0:33495 -> 8080/tcp 127.0.0.1:33107 -> 8448/tcp 127.0.0.1:33106 -> 8008/tcp Container: 8cb4e7a3725d291a3bb6762d7ac2af4cf6f66be64b8947542f8d299e4a01b645: [/complement_fed_1_fed.2_servers.hs2_2] (host) -> (container) 127.0.0.1:33105 -> 8448/tcp 127.0.0.1:33104 -> 8008/tcp 0.0.0.0:33492 -> 8009/tcp 0.0.0.0:33493 -> 8080/tcp Container: f3f21372d83bc31bd17ccd1636c1084d9187fad9bcb6ce987348d01576ee0406: [/complement_fed.2_servers.hs2] (host) -> (container) Container: aaa75bed21d55804aead3ab5bb10695803bb5c0a1f72d9b892f99d02d2548659: [/complement_fed.2_servers.hs1] (host) -> (container) =============== While deploying complement_fed_1_fed.2_servers.hs2_2 : END ALL COMPLEMENT DOCKER PORT BINDINGS =============== ``` ### What's the root cause? I suspect that we might be running into bugs with the Docker port allocation code since [Complement is creating the images in parallel](https://github.com/matrix-org/complement/blob/8f0c7639a37255e81f88773128b88fb75a2d4fe0/internal/docker/deployer.go#L186-L195) (I think). Compounding the problem might be that we're exposing many more ports than usual in the out-of-repo Complement image that we're using and Complement is using [`PublishAllPorts: true`](https://github.com/matrix-org/complement/blob/8f0c7639a37255e81f88773128b88fb75a2d4fe0/internal/docker/deployer.go#L372) when deploying the image which will take all of our exposed ports and make port bindings to the host for each. Bugs in this part of Docker is not uncommon, just [search for `"address already in use"` in the Docker codebase](https://github.com/moby/moby/issues?q=is%3Aissue%20state%3Aopen%20%22address%20already%20in%20use%22). For example, moby/moby#48274 has some smoke. --- The first layer of parallelism that we can control is multiple images within a single deploy which I think we can add a mutex around things to control. Multiple test packages running in parallel (controlled by `-p n`) is a good second factor that is probably harder to figure out. ```sh $ go help build ... -p n the number of programs, such as build commands or test binaries, that can be run in parallel. The default is GOMAXPROCS, normally the number of CPUs available. ```
1 parent 1544ac7 commit ec49265

File tree

2 files changed

+56
-2
lines changed

2 files changed

+56
-2
lines changed

internal/docker/builder.go

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,6 +494,48 @@ func printLogs(docker *client.Client, containerID, contextStr string) {
494494
log.Printf("============== %s : END LOGS ==============\n\n\n", contextStr)
495495
}
496496

497+
func printPortBindingsOfAllComplementContainers(docker *client.Client, contextStr string) {
498+
ctx := context.Background()
499+
500+
containers, err := docker.ContainerList(ctx, container.ListOptions{
501+
All: true,
502+
Filters: label(
503+
complementLabel,
504+
),
505+
})
506+
if err != nil {
507+
log.Printf("%s : Failed to list containers while trying to `printPortBindingsOfAllComplementContainers`: %s\n", contextStr, err)
508+
return
509+
}
510+
511+
log.Printf("============== %s : START ALL COMPLEMENT DOCKER PORT BINDINGS ==============\n", contextStr)
512+
513+
for _, container := range containers {
514+
log.Printf("Container: %s: %s", container.ID, container.Names)
515+
516+
inspectRes, err := docker.ContainerInspect(ctx, container.ID)
517+
if err != nil {
518+
log.Printf("%s : Failed to inspect container (%s) while trying to `printPortBindingsOfAllComplementContainers`: %s\n", contextStr, container.ID, err)
519+
return
520+
}
521+
522+
// Print an example so it's easier to understand the output
523+
log.Printf(" (host) -> (container)\n")
524+
// Then print the actual port bindings
525+
for containerPort, portBindings := range inspectRes.NetworkSettings.Ports {
526+
hostPortBindingStrings := make([]string, len(portBindings))
527+
for portBindingIndex, portBinding := range portBindings {
528+
hostPortBindingStrings[portBindingIndex] = fmt.Sprintf("%s:%s", portBinding.HostIP, portBinding.HostPort)
529+
}
530+
531+
log.Printf(" %s -> %s\n", strings.Join(hostPortBindingStrings, ", "), containerPort)
532+
}
533+
534+
}
535+
536+
log.Printf("=============== %s : END ALL COMPLEMENT DOCKER PORT BINDINGS ===============\n\n\n", contextStr)
537+
}
538+
497539
func endpoints(p nat.PortMap, csPort, ssPort int) (baseURL, fedBaseURL string, err error) {
498540
csapiPort := fmt.Sprintf("%d/tcp", csPort)
499541
csapiPortInfo, ok := p[nat.Port(csapiPort)]

internal/docker/deployer.go

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,9 @@ func (d *Deployer) CreateDirtyServer(hsName string) (*HomeserverDeployment, erro
9191
baseImageURI = uri
9292
}
9393

94+
containerName := fmt.Sprintf("complement_%s_dirty_%s", d.config.PackageNamespace, hsName)
9495
hsDeployment, err := deployImage(
95-
d.Docker, baseImageURI, fmt.Sprintf("complement_%s_dirty_%s", d.config.PackageNamespace, hsName),
96+
d.Docker, baseImageURI, containerName,
9697
d.config.PackageNamespace, "", hsName, nil, "dirty",
9798
networkName, d.config,
9899
)
@@ -101,6 +102,11 @@ func (d *Deployer) CreateDirtyServer(hsName string) (*HomeserverDeployment, erro
101102
// print logs to help debug
102103
printLogs(d.Docker, hsDeployment.ContainerID, "dirty")
103104
}
105+
106+
// Give some context for what the port bindings look like the time of the failure.
107+
// This gives better context for when `bind: address already in use` errors happen.
108+
printPortBindingsOfAllComplementContainers(d.Docker, "While dirty deploying "+containerName)
109+
104110
return nil, fmt.Errorf("CreateDirtyServer: Failed to deploy image %v : %w", baseImageURI, err)
105111
}
106112
return hsDeployment, nil
@@ -164,15 +170,21 @@ func (d *Deployer) Deploy(ctx context.Context, blueprintName string) (*Deploymen
164170
asIDToRegistrationMap := asIDToRegistrationFromLabels(img.Labels)
165171

166172
// TODO: Make CSAPI port configurable
173+
containerName := fmt.Sprintf("complement_%s_%s_%s_%d", d.config.PackageNamespace, d.DeployNamespace, contextStr, counter)
167174
deployment, err := deployImage(
168-
d.Docker, img.ID, fmt.Sprintf("complement_%s_%s_%s_%d", d.config.PackageNamespace, d.DeployNamespace, contextStr, counter),
175+
d.Docker, img.ID, containerName,
169176
d.config.PackageNamespace, blueprintName, hsName, asIDToRegistrationMap, contextStr, networkName, d.config,
170177
)
171178
if err != nil {
172179
if deployment != nil && deployment.ContainerID != "" {
173180
// print logs to help debug
174181
printLogs(d.Docker, deployment.ContainerID, contextStr)
175182
}
183+
184+
// Give some context for what the port bindings look like the time of the failure.
185+
// This gives better context for when `bind: address already in use` errors happen.
186+
printPortBindingsOfAllComplementContainers(d.Docker, "While deploying "+containerName)
187+
176188
return fmt.Errorf("Deploy: Failed to deploy image %+v : %w", img, err)
177189
}
178190
mu.Lock()

0 commit comments

Comments
 (0)