This is a Terraform module for provisioning a Weights & Biases Cluster on Google Cloud. Weights & Biases Local is our self-hosted distribution of wandb.ai. It offers enterprises a private instance of the Weights & Biases application, with no resource limits and with additional enterprise-grade architectural features like audit logging and single sign-on.
This module is intended to run in an Google Cloud account with minimal preparation, however it does have the following pre-requisites:
Google Services Used
- Google SQL Cloud (MySQL)
- Google Kubernetes Engine
- Google Storage Bucket
- Google PubSub
- Google Managed Certificates
- Google Cloud DNS
- Ensure account meets module pre-requisites from above.
- Create a Terraform configuration that pulls in this module and specifies values of the required variables:
provider "google" {
project = "<desired google project>"
region = "<desired google region>"
zone = "<desired google zone>"
}
module "wandb" {
source = "<filepath to cloned module directory>"
namespace = "<prefix for naming google resources>"
}
- Run
terraform init
andterraform apply
By default, the type of kubernetes instances, number of instances, redis cluster size, and database instance sizes are
standardized via configurations in ./deployment-size.tf, and is configured via the size
input
variable.
Available sizes are, small
, medium
, large
, xlarge
, and xxlarge
. Default is small
.
All the values set via deployment-size.tf
can be overridden by setting the appropriate input variables.
gke_machine_type
- The instance type for the GKE nodesgke_min_node_count
- The minimum number of nodes in the GKE clustergke_max_node_count
- The maximum number of nodes in the GKE clusterredis_memory_size_gb
- The memory size of the redis clusterdatabase_machine_type
- The instance type for the database
We have included documentation and reference examples for common installation scenarios, as well as examples for supporting resources that lack official modules.
Name | Version |
---|---|
terraform | ~> 1.0 |
~> 5.30 | |
helm | ~> 2.10 |
kubernetes | ~> 2.23 |
time | 0.11.2 |
Name | Version |
---|---|
~> 5.30 |
Name | Source | Version |
---|---|---|
app_gke | ./modules/app_gke | n/a |
app_lb | ./modules/app_lb | n/a |
clickhouse | ./modules/clickhouse | n/a |
database | ./modules/database | n/a |
gke_app | wandb/wandb/kubernetes | 1.14.1 |
kms | ./modules/kms | n/a |
kms_default_bucket | ./modules/kms | n/a |
kms_default_sql | ./modules/kms | n/a |
networking | ./modules/networking | n/a |
private_link | ./modules/private_link | n/a |
project_factory_project_services | terraform-google-modules/project-factory/google//modules/project_services | ~> 14.0 |
redis | ./modules/redis | n/a |
service_accounts | ./modules/service_accounts | n/a |
sleep | matti/resource/shell | 1.5.0 |
storage | ./modules/storage | n/a |
wandb | wandb/wandb/helm | 1.2.0 |
Name | Type |
---|---|
google_client_config.current | data source |
google_compute_forwarding_rules.all | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
allowed_inbound_cidrs | Which IPv4 addresses/ranges to allow access. This must be explicitly provided, and by default is set to ["*"] | list(string) |
[ |
no |
allowed_project_names | A map of allowed projects where each key is a project number and the value is the connection limit. | map(number) |
{} |
no |
app_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
bucket_default_encryption | Boolean to determine if a default bucket encryption key should be used. If true, a default key will be created. Takes precedence over bucket_kms_key_id . |
bool |
false |
no |
bucket_kms_key_id | ID of the customer-provided bucket KMS key. | string |
null |
no |
bucket_location | Location of the bucket (US, EU, ASIA) | string |
"US" |
no |
bucket_name | Use an existing bucket. | string |
"" |
no |
bucket_path | path of where to store data for the instance-level bucket | string |
"" |
no |
clickhouse_private_endpoint_service_name | ClickHouse private endpoint 'Service name' (ends in -clickhouse-cloud). | string |
"" |
no |
clickhouse_region | ClickHouse region (us-east1, us-central1, etc). | string |
"" |
no |
clickhouse_subnetwork_cidr | ClickHouse private service connect subnetwork | string |
"10.50.0.0/24" |
no |
controller_image_tag | Tag of the controller image to deploy | string |
"1.14.0" |
no |
create_private_link | Whether to create a private link service. | bool |
false |
no |
create_redis | Boolean indicating whether to provision an redis instance (true) or not (false). | bool |
false |
no |
create_workload_identity | Flag to indicate whether to create a workload identity for the service account. | bool |
false |
no |
database_machine_type | Specifies the machine type to be allocated for the database. Defaults to null and value from deployment-size.tf is used | string |
null |
no |
database_sort_buffer_size | Specifies the sort_buffer_size value to set for the database | number |
67108864 |
no |
database_version | Version for MySQL | string |
"MYSQL_8_0_31" |
no |
db_kms_key_id | ID of the customer-provided SQL KMS key. | string |
null |
no |
deletion_protection | If the instance should have deletion protection enabled. The database / Bucket can't be deleted when this value is set to true . |
bool |
true |
no |
disable_code_saving | Boolean indicating if code saving is disabled | bool |
false |
no |
domain_name | Domain for accessing the Weights & Biases UI. | string |
null |
no |
enable_stackdriver | n/a | bool |
false |
no |
force_ssl | Enforce SSL through the usage of the Cloud SQL Proxy (cloudsql://) in the DB connection string | bool |
false |
no |
gke_machine_type | Specifies the machine type for nodes in the GKE cluster. Defaults to null and value from deployment-size.tf is used | string |
null |
no |
gke_max_node_count | Maximum number of nodes for the GKE cluster. Defaults to null and value from deployment-size.tf is used | number |
null |
no |
gke_min_node_count | Initial number of nodes for the GKE cluster, if gke_max_node_count is set, this is the minimum number of nodes. Defaults to null and value from deployment-size.tf is used | number |
null |
no |
ilb_proxynetwork_cidr | Internal load balancer proxy subnetwork | string |
"10.127.0.0/24" |
no |
labels | Labels to apply to resources | map(string) |
{} |
no |
license | Your wandb/local license | string |
n/a | yes |
local_restore | Restores W&B to a stable state if needed | bool |
false |
no |
namespace | String used for prefix resources. | string |
n/a | yes |
network | Pre-existing network self link | string |
null |
no |
oidc_auth_method | OIDC auth method | string |
"implicit" |
no |
oidc_client_id | The Client ID of application in your identity provider | string |
"" |
no |
oidc_issuer | A url to your Open ID Connect identity provider, i.e. https://cognito-idp.us-east-1.amazonaws.com/us-east-1_uiIFNdacd | string |
"" |
no |
oidc_secret | The Client secret of application in your identity provider | string |
"" |
no |
operator_chart_version | Version of the operator chart to deploy | string |
"1.3.4" |
no |
other_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
parquet_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
psc_subnetwork_cidr | Private link service reserved subnetwork | string |
"192.168.0.0/24" |
no |
public_access | Whether to create a public endpoint for wandb access. | bool |
true |
no |
redis_memory_size_gb | Specifies the memory size in GB for the Redis instance. Defaults to null and value from deployment-size.tf is used | number |
null |
no |
redis_reserved_ip_range | Reserved IP range for REDIS peering connection | string |
"10.30.0.0/16" |
no |
redis_tier | Specifies the tier for this Redis instance | string |
"STANDARD_HA" |
no |
resource_limits | Specifies the resource limits for the wandb deployment | map(string) |
{ |
no |
resource_requests | Specifies the resource requests for the wandb deployment | map(string) |
{ |
no |
size | Deployment size for the instance | string |
"small" |
no |
skip_bucket_admin_role | Flag to indicate whether to skip the bucket policy creation. | bool |
false |
no |
sql_default_encryption | Boolean to determine if a default SQL encryption key should be used. If true, a default key will be created. Takes precedence over db_kms_key_id . |
bool |
false |
no |
ssl | Enable SSL certificate | bool |
true |
no |
stackdriver_sa_name | n/a | string |
"wandb-stackdriver" |
no |
subdomain | Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route. | string |
null |
no |
subnetwork | Pre-existing subnetwork self link | string |
null |
no |
use_internal_queue | Uses an internal redis queue instead of using google pubsub. | bool |
false |
no |
wandb_image | Docker repository of to pull the wandb image from. | string |
"wandb/local" |
no |
wandb_version | The version of Weights & Biases local to deploy. | string |
"latest" |
no |
weave_wandb_env | Extra environment variables for W&B | map(string) |
{} |
no |
Name | Description |
---|---|
address | n/a |
bucket_name | Name of google bucket. |
bucket_path | path of where to store data for the instance-level bucket |
bucket_queue_name | Pubsub queue created for google bucket file upload events. |
clickhouse_private_endpoint_id | ClickHouse Private endpoint Endpoint ID to secure access inside VPC |
cluster_ca_certificate | Certificate of the kubernetes (GKE) cluster. |
cluster_client_certificate | n/a |
cluster_client_key | n/a |
cluster_endpoint | Endpoint of the kubernetes (GKE) cluster. |
cluster_id | ID of the kubernetes (GKE) cluster. |
cluster_name | n/a |
cluster_node_pool | Default node pool where Weights & Biases should be deployed into. |
cluster_self_link | Self link of the kubernetes (GKE) cluster. |
database_connection_string | Full database connection string. You must be in the VPC to access the database. |
database_instance_type | n/a |
fqdn | The FQDN to the W&B application |
gke_max_node_count | n/a |
gke_node_count | n/a |
gke_node_instance_type | n/a |
private_attachement_id | n/a |
sa_account_email | This output provides the email address of the service account created for workload identity, if workload identity is enabled. Otherwise, it returns null |
service_account | Weights & Biases service account used to manage resources. |
standardized_size | n/a |
url | The URL to the W&B application |
6.0.0 introduced autoscaling to the GKE cluster and made the size
variable the preferred way to set the cluster size.
Previously, unless the size
variable was set explicitly, there were default values for the following variables:
gke_machine_type
gke_node_count
redis_memory_size_gb
db_machine_type
The size
variable is now defaulted to small
, and the following values to can be used to partially override the values
set by the size
variable:
gke_machine_type
gke_min_node_count
gke_max_node_count
redis_memory_size_gb
database_machine_type
For more information on the available sizes, see the Cluster Sizing section.
If having the cluster scale nodes in and out is not desired, the gke_min_node_count
and gke_max_node_count
can be set
to the same value to prevent the cluster from scaling.
3.6.0 introduced a change in the Google Provider that isn't backwards compatible with prior versions. Nothing needs to be done to upgrade, but it is not backwards compatible.