Skip to content

Debug logs for port bindings when Docker deploy image fails #761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

MadLittleMods
Copy link
Collaborator

@MadLittleMods MadLittleMods commented Jan 29, 2025

Debug logs for port bindings when Docker deploy image fails.

This is spawning from bind: address already in use failures we were seeing in CI and will hopefully give a better picture of what's happening.

Example bind: address already in use error:

Deploy: Deploy returned error Deploy: Failed to deploy image {Containers:-1 Created:1738100303 ID:sha256:4d01dfad2d62669c5784270111944528031170173ddc0b66d96f58ae700812d5 Labels:map[complement_blueprint:5_servers complement_context:sidecar_tests.5_servers.hs4 complement_hs_name:hs4 complement_pkg:sidecar_tests gitsha1:68e27e962c946074d6f9e18e98e55b3fe4f4c128 org.opencontainers.image.documentation:https://github.com/element-hq/synapse/blob/master/docker/README.md org.opencontainers.image.licenses:AGPL-3.0-or-later org.opencontainers.image.source:https://github.com/element-hq/synapse.git org.opencontainers.image.url:https://matrix.org/docs/projects/server/synapse] ParentID:sha256:2ae79e697d44ba986e3369e5a5197328e7b9638fa20102c9ca259e885ae71eea RepoDigests:[] RepoTags:[localhost/complement:sidecar_tests.5_servers.hs4] SharedSize:-1 Size:1400625929 VirtualSize:0} : Error response from daemon: driver failed programming external connectivity on endpoint complement_sidecar_tests_3_sideca
  r_tests.5_servers.hs4_2 (fc1bbb210acbcb5c13e64dc1336b36e914f4c8a5069db8d0ded1e12f0f26f10f): Error starting userland proxy: listen tcp4 127.0.0.1:33090: bind: address already in use

Example logs that this PR will output

$ COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestFederationRoomsInvite
...

============== While deploying complement_fed_1_fed.2_servers.hs2_2 : START ALL COMPLEMENT DOCKER PORT BINDINGS ==============
Container: 6c4665d6a12f063f0a0c0707fe826ac7bb1ef7642d9cc2b8189206c28aa30c12: [/complement_fed_1_fed.2_servers.hs1_1]
    (host) -> (container)
    0.0.0.0:33494 -> 8009/tcp
    0.0.0.0:33495 -> 8080/tcp
    127.0.0.1:33107 -> 8448/tcp
    127.0.0.1:33106 -> 8008/tcp
Container: 8cb4e7a3725d291a3bb6762d7ac2af4cf6f66be64b8947542f8d299e4a01b645: [/complement_fed_1_fed.2_servers.hs2_2]
    (host) -> (container)
    127.0.0.1:33105 -> 8448/tcp
    127.0.0.1:33104 -> 8008/tcp
    0.0.0.0:33492 -> 8009/tcp
    0.0.0.0:33493 -> 8080/tcp
Container: f3f21372d83bc31bd17ccd1636c1084d9187fad9bcb6ce987348d01576ee0406: [/complement_fed.2_servers.hs2]
    (host) -> (container)
Container: aaa75bed21d55804aead3ab5bb10695803bb5c0a1f72d9b892f99d02d2548659: [/complement_fed.2_servers.hs1]
    (host) -> (container)
=============== While deploying complement_fed_1_fed.2_servers.hs2_2 : END ALL COMPLEMENT DOCKER PORT BINDINGS ===============

What's the root cause?

I suspect that we might be running into bugs with the Docker port allocation code since Complement is creating the images in parallel (I think).

Compounding the problem might be that we're exposing many more ports than usual in the out-of-repo Complement image that we're using and Complement is using PublishAllPorts: true when deploying the image which will take all of our exposed ports and make port bindings to the host for each.

Bugs in this part of Docker is not uncommon, just search for "address already in use" in the Docker codebase. For example, moby/moby#48274 has some smoke.


The first layer of parallelism that we can control is multiple images within a single deploy which I think we can add a mutex around things to control.

Multiple test packages running in parallel (controlled by -p n) is a good second factor that is probably harder to figure out.

$ go help build
...

-p n
    the number of programs, such as build commands or
    test binaries, that can be run in parallel.
    The default is GOMAXPROCS, normally the number of CPUs available.

Pull Request Checklist

Signed-off-by: Eric Eastwood [email protected]

@MadLittleMods MadLittleMods marked this pull request as ready for review January 29, 2025 17:56
@MadLittleMods MadLittleMods requested review from kegsay and a team as code owners January 29, 2025 17:56
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kegsay Any thoughts on the possible root cause described in the issue description?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. This isn't something we see in vanilla Complement runs, so it's likely something unique to your configuration.

@MadLittleMods MadLittleMods merged commit ec49265 into main Feb 11, 2025
4 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/debug-port-bindings-when-deploy-fails branch February 11, 2025 15:27
@MadLittleMods
Copy link
Collaborator Author

Thanks for the review @kegsay 🐿️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants