Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Docker Python SDK with Docker CLI #10

Open
dionhaefner opened this issue Jan 16, 2025 · 11 comments · May be fixed by #33
Open

Replace Docker Python SDK with Docker CLI #10

dionhaefner opened this issue Jan 16, 2025 · 11 comments · May be fixed by #33
Assignees
Labels
enhancement New feature or request

Comments

@dionhaefner
Copy link
Contributor

Assumed to fix issues like pasteurlabs/tesseract#433 and DOCKER_HOST related inconsistencies. Basically, if you can use the Docker CLI you're gtg.

@xalelax
Copy link
Contributor

xalelax commented Jan 20, 2025

I'm not 100% sure this is a good strategy -- seems like going backwards. @heitorPB any opinion?

@dionhaefner
Copy link
Contributor Author

Going backwards how?

@heitorPB
Copy link
Contributor

The downside of using the docker binary directly is that we lose support for Podman. Docker-py interacts with the Docker REST API, which Podman also implements and should be bug-for-bug compatible. The killer feature of Podman IMHO is that it is rootless by default, in contrast of Docker rootfull by default.

So, I'm with @xalelax here, not sure if using the docker binary directly is a good strategy.

@dionhaefner
Copy link
Contributor Author

dionhaefner commented Jan 21, 2025

I agree that in a perfect universe we'd use the Docker Python client everywhere. However, that doesn't seem to be the reality we live in.

The core issue is that Docker-py is lagging behind the Docker CLI in terms of feature support, so we cannot use BuildKit from the SDK (which we need for SSH mounts for example). So we're already in a situation where we're using both: subprocess.run['docker', ...] for BuildKit interactions, and Docker-py for everything else.

What I propose is to move entirely to using the CLI for interactions with containers. Reasons are:

  • The CLI is the official interface and orders of magnitude more stable and better documented than Docker-py.
  • End users only need to verify they can use the Docker CLI to build Tesseracts. In the DOCKER_HOST issues we've seen, only SDK access was broken, everyone could use the CLI just fine.
  • Again, we're already using both, so moving to a single interfaces just decreases the surface of things to go wrong.

For podman, all we need to do is use subprocess.run['podman', *docker_args], right? (Since the CLI is a drop-in replacement.) So we'd just need to supply a config --docker-cmd=podman to the Tesseract CLI and we're good?

@heitorPB
Copy link
Contributor

heitorPB commented Jan 21, 2025

So we're already in a situation where we're using both

I didn't realize that until recently...

For podman, all we need to do is use subprocess.run['podman', *docker_args], right?

Should do the job. I liked the idea of a --docker-cmd=podman :)

@angela-ko angela-ko self-assigned this Jan 24, 2025
@angela-ko
Copy link
Contributor

Just going to note down that another issue with swapping is error handling... it may be more ugly / less precise when we swap.

https://github.com/pasteurlabs/tesseract/blob/main/tesseract/cli.py#L240-L243

@heitorPB
Copy link
Contributor

Another issue with swapping is dealing with images. We rely on docker.models.images.Image for metadata all over the codebase.

@angela-ko
Copy link
Contributor

Another issue with swapping is dealing with images. We rely on docker.models.images.Image for metadata all over the codebase.

On this point, I have started the surgery on our cli/engine, and currently it is in a state where tesseract list and tesseract ps are functional.

However, I have noticed that the docker cli version of these commands is significantly slower, so I set a timer to check:

Docker CLI:
tesseract list: 24.967s
tesseract ps: 2.484s

Docker Py:
tesseract list: 0.376s
tesseract ps: 1.866s

While ps has a slowdown too, the list slow down is extremely significant as the only way, afaik, of retrieving docker image metadata is running docker inspect image_id individually on each image id, whereas docker_py has the docker.models.image.Image that Heitor mentioned.

(Don't look too hard on my draft PR...I haven't cleaned it up since i just wanted to post some initial results and concerns I have regarding the speed of the commands...)

@dionhaefner

@dionhaefner
Copy link
Contributor Author

You can do docker inspect img_1 img_2 img_3 ... to retrieve information on multiple images at once. Could you try that and report back please whether that improves things?

@angela-ko
Copy link
Contributor

You can do docker inspect img_1 img_2 img_3 ... to retrieve information on multiple images at once. Could you try that and report back please whether that improves things?

Sorry for the late reply but doing that makes no difference in timing. It is still 24.6s for tesseract list

@dionhaefner
Copy link
Contributor Author

Looks like you missed a spot :) Let's discuss on your PR instead.

@andrinr andrinr transferred this issue from another repository Feb 25, 2025
@andrinr andrinr added the enhancement New feature or request label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants