Document self-hosted implementation choice

This ticket aims to give some feedback on the choice made for the implementation of the self-hosted runner (Chapter 3.7 and 3.8). Indeed, unlike the previous chapters that are quite straightforward, there are various possible solution, all with their own advantages and drawbacks. The solution to adopt is heavily dependent on the context of the user.

The resulting ticket should be also useful to not only document the "how", but the "why" and gives pointers for another implementation that might be more adequate in specific context (especially, enterprise context with data privacy in mind). It could be used as a basis for a future "what's next ?" or "What if... ?" page.

For the guide, the solution adopted is **Option 4**, which requires 2 self-hosted pods, but allow to easily bypass the firewall restrictions.

----

### Goal

Create a self-hosted runner to have access to GPU on a machine/cluster for training. Update the dvc.lock file remotely, and pushes the date to S3 from the k8s server (with autocommit in git repo)

Note: Kubernetes does not support running Docker directly inside a container due to security and architectural reasons (container in container).

GPU access from container in container might also be problematic (might not apply here though).

## Option 1 (Starting point): 1 GH runner on GitHub

This runs on a simple VM
It
  - gets data from S3 (dvc pull)
  - checks the experience is reproducible / "trains" ML model with up-to-date dvc.lock and dvc puch (done locally)
  - containerizes the model with bentoml
  - logs in with docker and pushes the image to the artifact registry
  - deploys the image on k8s

Cons:
* training the model remotely is not possible (slow, timeout for long jobs, action minutes limited, data on GH)


## Option 2: 1 GH runner on k8s (GCP)

Pro: One self-hosted runner, no runner on GH. Data/model is not on GitHub.

Cons:
 * docker login and push: requires nested container: which breaks! Kubernetes does not support running Docker directly inside a container due to security and architectural reasons
 
Cons:
* does not work and requires workaround! docker containairization on k8s not directly possible.
* Connectivity/firewall issue

## Option 2a: Use GCP custom solution

Solution specifically developed to solve this issue.

Cons: vendor lock-in, not applicable for on-premise solution, data privacy.

## Option 2b: mount docker socket

* mount docker socket (access docker daemon, install docker-cli on runner): works (for now) but will stop working soon in future version of k8s.

Cons: docker is running out of the pod and can live indefinitely even if the pod is destroyed, bad practice unless one has total ownership of the machine, bad in terms of security,  deprecated solution and will stop working soon.

## Option 2c: using docker in docker

mount socket is obsolete. But docker in docker should now work!
See https://hub.docker.com/_/docker

## Option 3: set up GH runner on (GCP) VM + k8s

- runner running on docker / containerization possible

Cons:
 * need a VM in addition to a k8s cluster
 
## Option 4: 2 GH runner (1 GH, 1 k8s)
  - split steps depending on Docker need
  - use gcloud/kubectl on K8s GCP

Cons: data privacy issue: Data is containerized on GH servers.

## Option 5 (as 2e): K8S + kubevirt

* abstraction of pods using VMs

Pros:
* best of two worlds

Cons:
* additional abstration layers, complexity

## Option 5 (as 2f): k8s + kaniko

Kaniko is used to deploy pod and create docker images.
See https://devopscube.com/build-docker-image-kubernetes-pod/

Cons:
* additional abstration layers, complexity

----

## Connection / Firewall issue

Reaching k8s from GH action:
* requires public IP
* could make use of VPN layer (wireguard, see GH docs)
* can use polling, bu running an instance on the self-hosted runenr which "listens" to the GH runner. This is the approach currently selected.
* could make use of VPN layer (wireguard, see GH docs) instead to avoid the listening pod.

## See also

* https://coder.com/docs/admin/templates/extending-templates/docker-in-workspaces
* https://github.com/nestybox/sysbox


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document self-hosted implementation choice #244

Goal

Option 1 (Starting point): 1 GH runner on GitHub

Option 2: 1 GH runner on k8s (GCP)

Option 2a: Use GCP custom solution

Option 2b: mount docker socket

Option 2c: using docker in docker

Option 3: set up GH runner on (GCP) VM + k8s

Option 4: 2 GH runner (1 GH, 1 k8s)

Option 5 (as 2e): K8S + kubevirt

Option 5 (as 2f): k8s + kaniko

Connection / Firewall issue

See also

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document self-hosted implementation choice #244

Description

Goal

Option 1 (Starting point): 1 GH runner on GitHub

Option 2: 1 GH runner on k8s (GCP)

Option 2a: Use GCP custom solution

Option 2b: mount docker socket

Option 2c: using docker in docker

Option 3: set up GH runner on (GCP) VM + k8s

Option 4: 2 GH runner (1 GH, 1 k8s)

Option 5 (as 2e): K8S + kubevirt

Option 5 (as 2f): k8s + kaniko

Connection / Firewall issue

See also

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions