seqeralabs · justinegeffen · Dec 19, 2024 · Dec 19, 2024 · Dec 20, 2024 · Jan 15, 2025
diff --git a/docs/self-hosted.mdx b/docs/self-hosted.mdx
@@ -0,0 +1,290 @@
+---
+title: Self-hosted installation
+---
+
+This documentation is about hosting your own version of the Wave service - not using the normal Seqera-hosted version of Wave to generate containers.
+
+### Requirements
+
+* Java 21 or later
+* Linux or macOS
+* Redis 6.2 (or later)
+* Docker engine (for development)
+* Kubernetes cluster (for production)
+
+**Notes**
+
+* Suggested instance type for Wave backend `m5a.2xlarge`
+* This deployment does not support the build of ARM (Graviton) CPU architecture containers.
+
+### Get started
+
+1. Clone the Wave repository from GitHub:
+
+`git clone https://github.com/seqeralabs/wave && cd wave`
+
+2. Define one of more of those environment variable pairs depending the target registry you need to access:
+
+```
+export DOCKER_USER="<Docker registry user name>"
+export DOCKER_PAT="<Docker registry access token or password>"
+export QUAY_USER="<Quay.io registry user name or password>"
+export QUAY_PAT="<Quay.io registry access token>"
+export AWS_ACCESS_KEY_ID="<AWS ECR registry access key>"
+export AWS_SECRET_ACCESS_KEY="<AWS ECR registry secret key>"
+export AZURECR_USER="<Azure registry user name>"
+export AZURECR_PAT="<Azure registry access token or password>"
+```
+
+3. Set up a [local tunnel](https://github.com/localtunnel/localtunnel) to make the Wave service accessible to the Docker client (only needed if you are running on macOS):
+
+`npx localtunnel --port 9090`
+
+4. Then configure the following variable in your environment using the domain name return by local tunnel, e.g.:
+
+`export WAVE_SERVER_URL="https://sweet-nights-report.loca.lt"`
+
+5. Run the service in your computer:
+
+`bash run.sh`
+
+6. Submit a container request to the Wave service using the `curl` tool:
+
+```
+curl \
+    -H "Content-Type: application/json" \
+    -X POST $WAVE_SERVER_URL/container-token \
+    -d '{"containerImage":"ubuntu:latest"}' \
+    | jq -r .targetImage
+```
+
+7. Pull the container image using the name returned in the previous command, for example:
+
+`docker pull sweet-nights-report.loca.lt/wt/617e7da1b37a/library/ubuntu:latest`
+
+**Note** You can use the [Wave](https://github.com/seqeralabs/wave-cli) command line tool instead of `curl` to interact with the Wave service and submit more complex requests.
+
+## Debugging
+
+To debug http requests made proxy client add the following Jvm setting:
+
+`'-Djdk.httpclient.HttpClient.log=requests,headers'`
+
+## Comprehensive installation
+
+Build and augment container images for Nextflow workflows.
+
+### Requirements
+
+* AWS EKS cluster
+* AWS OpenID Connect (OIDC) provider for EKS
+* AWS S3 bucket for logs
+* AWS EFS for shared files
+* AWS EFS CSI driver for EKS
+* AWS Application Load Balancer
+* AWS Certificate Manager
+* AWS SES (simple email service)
+* AWS ECR service
+* AWS Elasticache
+* AWS Route53
+
+### AWS EKS preparation
+
+* Create a EKS cluster instance following the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html). When the cluster is ready create a new Kubernetes namespace where the Wave service is going to be deployed e.g. `wave-production`.
+* Create an AWS S3 bucket in the same region where the EKS cluster is running. The bucket will host Wave logs, e.g. `wave-logs-prod`.
+* Create an EFS file system instance as described in the [AWS documentation](https://docs.aws.amazon.com/efs/latest/ug/gs-step-two-create-efs-resources.html). Make sure to use the same VPC used for the EKS cluster and [EFS CSI driver](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html) for EKS. Also make sure Your EFS file system's security group must have an inbound and outbound rule that allows NFS traffic from the CIDR for your cluster's VPC. Allow port 2049 for inbound and outbound traffic.
+* Create AWS Certificate to allow HTTPS traffic to your Wave service by using the AWS Certificate Manager. The certificate should be in the same region where the EKS cluster is running. See the [AWS documentation](https://docs.aws.amazon.com/acm/latest/userguide/gs-acm-request-public.html) for further details.
+* Create two container repositories in the same region where the container is deployed. The first repository is used to host the container images built by Wave and the second one will be used for caching purposes. Make sure to create two repository have the same name prefix e.g. `wave/build` and `wave/cache`.
+* Create an AWS Elasticache instance used by Wave for caching purposes. It's required the use of a single-node cluster. For production deployment it's adviced the used of instance type `cache.t3.medium` and using Redis 6.2.x engine version or later (serverless is not supported). Make sure to use the same VPC used for the EKS cluster.
+* The AWS SES service is required by Wave to send email notification. Make sure to have configured a AWS SES service for production usage. See the [AWS documentation](https://docs.aws.amazon.com/ses/latest/dg/request-production-access.html) for further details.
+
+### AWS policy and role creation
+
+Create an AWS IAM policy that will grant access to the AWS infrastructure to the Wave application. This requires your cluster to have an existing AWS Identity and Access Management (IAM) OpenID Connect (OIDC) provider for your EKS cluster. To determine whether you already have one, or to create one, see [Creating an IAM OIDC provider for your cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html).
+
+1. Make sure the file `settings.sh` have a valid value for the following settings:
+    * `AWS_REGION`: The AWS region where your cluster is deployed e.g. `eu-central-1`.
+    * `AWS_ACCOUNT`: The ID of the AWS account where the cluster is deployed.
+    * `WAVE_CONFIG_NAME`: The name for this Wave deployment e.g. `seqera-wave`.
+    * `WAVE_LOGS_BUCKET`: The S3 bucket for storing Wave logs, created in the previous step.
+    * `WAVE_CONTAINER_NAME_PREFIX`: The name prefix given the build cache ECR repository e.g. `wave`
+    * `AWS_EKS_CLUSTER_NAME`: The name of the cluster name where the service is going to be deployed.
+    * `WAVE_NAMESPACE`: The Kubernetes namespace where the Wave service is going to be deployed e.g. `wave-test`.
+    * `WAVE_BUILD_NAMESPACE`: The Kubernetes namespace where container build jobs will be executed e.g. `wave-build`.
+2. Create the IAM policy using the template included in this repo with name `seqera-wave-policy.json` and using the command below:
+
+```
+source settings.sh
+
+aws \
+
+  --region $AWS_REGION \
+
+  iam create-policy \
+
+  --policy-name $WAVE_CONFIG_NAME \
+  --policy-document file://&lt;( cat policies/seqera-wave-policy.json | envsubst )
+```
+
+Take note of the policy Arn show in the command result. Find your cluster's OIDC provider URL using the command below:
+
+```
+aws \
+
+  --region $AWS_REGION \
+
+  eks describe-cluster \
+
+  --name $AWS_EKS_CLUSTER_NAME \
+
+  --query "cluster.identity.oidc.issuer" \
+
+  --output text
+```
+
+An example output is as follows:
+
+`https://oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE`
+
+
+Set the variable `AWS_EKS_OIDC_ID` in the `settings.sh` using the ID value from your result. Then run the command below:
+
+```
+source settings.sh
+aws \
+  --region $AWS_REGION \
+  iam create-role \
+  --role-name $WAVE_CONFIG_NAME \
+  --assume-role-policy-document file://<( cat policies/seqera-wave-role.json | envsubst )
+```
+
+Take note of the Arn of the IAM role created and use it as value for the variable `AWS_IAM_ROLE` in the `settings.sh` file.
+
+Finally, attach to the role the policy created in the previous step, using the command below:
+
+```
+source settings.sh
+
+aws \
+
+  --region $AWS_REGION \
+
+  iam attach-role-policy \
+
+  --role-name $WAVE_CONFIG_NAME \
+
+  --policy-arn arn:aws:iam::$AWS_ACCOUNT:policy/$WAVE_CONFIG_NAME
+```
+
+## Deployment
+
+### Kubernetes manifests preparation
+
+Update the variables in the file `settings.sh` with the values corresponding your AWS infrastructure created in the previous step. The following settings are required:
+
+* `WAVE_HOSTNAME`: The host name to use to access the Wave service e.g. `wave.your-company.com`. This should match the host name used when creating the HTTPS certificate by using AWS Certificate manager.
+* `WAVE_CONTAINER_BUILD_REPO`: The ECR repository name used to host the containers built by Wave e.g. `&lt;YOUR ACCOUNT>.dkr.ecr.&lt;YOUR REGION>.amazonaws.com/wave/build`.
+* `WAVE_CONTAINER_CACHE_REPO`: The ECR repository name used to cache the containers built by Wave e.g. `&lt;YOUR ACCOUNT>.dkr.ecr.&lt;YOUR REGION>.amazonaws.com/wave/cache`.
+* `WAVE_LOGS_BUCKET`: The AWS S3 bucket used to store the Wave logs e.g. `wave-logs-prod`.
+* `WAVE_REDIS_HOSTNAME`: The AWS Elasticache instance hostname and port e.g. `&lt;YOUR ELASTICACHE INSTANCE>.cache.amazonaws.com:6379`.
+* `WAVE_SENDER_EMAIL`: The email address that will be used by Wave to send email e.g. `[email protected]`. Note: it must an email address validated in your AWS SES setup.
+* `TOWER_API_URL`: The API URL of your Seqera Platform installation e.g. `&lt;https://your-platform-hostname.com>/api`.
+* `AWS_EFS_VOLUME_HANDLE`: The AWS EFS shared file system instance ID e.g. `fs-12345667890`
+* `AWS_CERTIFICATE_ARN`: The arn of the AWS Certificate created during the environment preparation e.g. `arn:aws:acm:&lt;YOUR REGION>:&lt;YOUR ACCOUNT>:certificate/&lt;YOUR CERTIFICATE ID>`
+* `AWS_IAM_ROLE`: The arn of the AWS IAM role granting permissions to AWS resources to the Wave service.
+* `SURREAL_DB_PASSWORD`: User defined password to be used for embedded Surreal DB deployed by Wave.
+* `SEQERA_CR_USER`: The username to access the Seqera container registry to providing the images for installing Wave service
+* `SEQERA_CR_PASSWORD`: The password to access the Seqera container registry to providing the images for installing Wave service
+
+### Application deployment
+
+Once the application manifest files have been updated replacing the above variables with the corresponding values, proceed with the application deployment following those steps:
+
+1. Export the content of the `settings.sh` file in your environment using this command:
+
+  `source settings.sh`
+
+2. Create storage, app namespace and roles: 
+
+  `kubectl apply -f <(cat src/create.yml | envsubst)`
+  `kubectl config set-context --current --namespace=${WAVE_NAMESPACE}`
+
+3. Set up the Container registry credentials to access the Wave container image:
+
+```
+kubectl create secret \
+
+   docker-registry seqera-reg-creds \
+
+   --namespace "${WAVE_NAMESPACE}" \
+
+   --docker-server=cr.seqera.io \
+
+   --docker-username="${SEQERA_CR_USER}" \
+
+   --docker-password="${SEQERA_CR_PASSWORD}"
+```
+
+4. Create build storage and namespace:
+
+  `kubectl apply -f &lt;(cat src/build.yml | envsubst)`
+
+5. Deploy Surreal DB:
+
+  `kubectl apply -f &lt;(cat src/surrealdb.yml | envsubst)`
+
+6. Deploy the main application resources:
+
+  `kubectl apply -f &lt;(cat src/app.yml | envsubst)`
+
+7. Deploy the Ingress controller:
+
+  `kubectl apply -f &lt;(cat src/ingress.yml | envsubst)`
+
+The ingress controller will automatically create an AWS application load balancer to serve the Wave service traffic. The load balancer address can be retrieved using the following command:
+
+`kubectl get ingress wave-ingress -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'`
+
+Use the load balancerhostname to create an *alias* record in your Route53 DNS so that the Wave service hostname is mapped to the load balancer hostname created by the ingress. See the [AWS documentation](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-to-elb-load-balancer.html) for details.
+
+Once the DNS is configured verify the Wave API is accessible using this command:
+
+`curl https://${WAVE_HOSTNAME}/service-info | jq`
+
+8. Pair Seqera Platform with Wave
+
+Once the Wave service is ready, you will need to configure the Seqera Platform to pair with the Wave service in your infrastructure. Follow the documentation available at [this link](https://docs.seqera.io/platform/latest/enterprise/configuration/wave) replacing the Wave endpoint `https://wave.seqera.io` with the one defined in your installation.
+
+9. Verify the service is operating correctly. Check the Wave pod logs. There should not be any errors and it should be reported the line:
+
+`Opening pairing session - endpoint: <YOUR SEQERA PLATFORM URL>`
+
+10. Sign in to the Seqera Platform and create a Personal access token. Then export the token value as shown below:
+
+`export TOWER_ACCESS_TOKEN=<TOKEN VALUE>`
+
+3. Download the [Wave CLI](https://github.com/seqeralabs/wave-cli) tool, and use it request a Wave container using the command below:
+
+```
+wave \
+
+    --wave-endpoint https://$WAVE_HOSTNAME \
+
+    --tower-endpoint $TOWER_API_URL \
+
+    --image alpine:latest
+```
+
+It will show the Wave container name for the requested `alpine` image. You should be able to pull the container using a simple `docker pull <wave container>` command. To verify the Wave build is working as expected run:
+
+```
+wave \
+
+    --wave-endpoint https://$WAVE_HOSTNAME \
+
+    --tower-endpoint $TOWER_API_URL \
+
+    --conda-package cowpy
+```
+
+You should receive an email notification when the Wave build process completes and the container is ready to be pulled.