Skip to content

Create dataflow flex template module #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions modules/dataflow-flex-job/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Dataflow Flex Template Job Module

This module handles opinionated Dataflow flex template job configuration and deployments.

## Usage

Before using this module, one should get familiar with the `google_dataflow_flex_template_job`’s [Note on "destroy"/"apply"](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataflow_flex_template_job#note-on-destroy--apply) as the behavior is atypical when compared to other resources.

### Assumption

One assumption is that, before using this module, you already have a working Dataflow flex job template(s) in a GCS location.
If you are not using public IPs, you need to [Configure Private Google Access](https://cloud.google.com/vpc/docs/configure-private-google-access)
on the VPC used by Dataflow.

This is a simple usage:

```hcl
module "dataflow-flex-job" {
source = "terraform-google-modules/secured-data-warehouse/google//modules/dataflow-flex-job"
version = "~> 0.1"

project_id = "<project_id>"
region = "us-east4"
name = "dataflow-flex-job-00001"
container_spec_gcs_path = "gs://<path-to-template>"
job_language = "JAVA"
on_delete = "cancel"
staging_location = "gs://<gcs_path_staging_data_bucket>"
temp_location = "gs://<gcs_path_temp_data_bucket>"
subnetwork_self_link = "<subnetwork-self-link>"
kms_key_name = "<fully-qualified-kms-key-id>"
service_account_email = "<dataflow-controller-service-account-email>"
use_public_ips = false

parameters = {
firstParameter = "ONE",
secondParameter = "TWO
}
}
```

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| container\_spec\_gcs\_path | The GCS path to the Dataflow job flex template. | `string` | n/a | yes |
| enable\_streaming\_engine | Enable/disable the use of Streaming Engine for the job. Note that Streaming Engine is enabled by default for pipelines developed against the Beam SDK for Python v2.21.0 or later when using Python 3. | `bool` | `true` | no |
| job\_language | Language of the flex template code. Options are 'JAVA' or 'PYTHON'. | `string` | `"JAVA"` | no |
| kms\_key\_name | The name for the Cloud KMS key for the job. Key format is: `projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY`. | `string` | n/a | yes |
| max\_workers | The number of workers permitted to work on the job. More workers may improve processing speed at additional cost. | `number` | `1` | no |
| name | The name of the dataflow flex job. | `string` | n/a | yes |
| on\_delete | One of drain or cancel. Specifies behavior of deletion during terraform destroy. The default is cancel. | `string` | `"cancel"` | no |
| parameters | Key/Value pairs to be passed to the Dataflow job (as used in the template). | `map(any)` | `{}` | no |
| project\_id | The project in which the resource belongs. If it is not provided, the provider project is used. | `string` | n/a | yes |
| region | The region in which the created job should run. | `string` | n/a | yes |
| service\_account\_email | The Service Account email that will be used to identify the VMs in which the jobs are running | `string` | n/a | yes |
| staging\_location | GCS path for staging code packages needed by workers. | `string` | n/a | yes |
| subnetwork\_self\_link | The subnetwork self link to which VMs will be assigned. | `string` | n/a | yes |
| temp\_location | GCS path for saving temporary workflow jobs. | `string` | n/a | yes |
| use\_public\_ips | If VMs should used public IPs. | `string` | `false` | no |

## Outputs

| Name | Description |
|------|-------------|
| job\_id | The unique ID of this job. |
| state | The current state of the resource, selected from the JobState enum. See https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#Job.JobState . |

<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
59 changes: 59 additions & 0 deletions modules/dataflow-flex-job/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

locals {
java_pipeline_options = {
serviceAccount = var.service_account_email
subnetwork = var.subnetwork_self_link
dataflowKmsKey = var.kms_key_name
tempLocation = var.temp_location
stagingLocation = var.staging_location
maxNumWorkers = var.max_workers
usePublicIps = var.use_public_ips
enableStreamingEngine = var.enable_streaming_engine
}

python_pipeline_options = {
service_account_email = var.service_account_email
subnetwork = var.subnetwork_self_link
dataflow_kms_key = var.kms_key_name
temp_location = var.temp_location
staging_location = var.staging_location
max_num_workers = var.max_workers
use_public_ips = var.use_public_ips
enable_streaming_engine = var.enable_streaming_engine
}

pipeline_options = var.job_language == "JAVA" ? local.java_pipeline_options : local.python_pipeline_options

experiment_options = var.kms_key_name != null && var.enable_streaming_engine ? { experiments = "enable_kms_on_streaming_engine" } : {}
}


#enable_streaming_engine

resource "google_dataflow_flex_template_job" "dataflow_flex_template_job" {
provider = google-beta

project = var.project_id
name = var.name
container_spec_gcs_path = var.container_spec_gcs_path
region = var.region
on_delete = var.on_delete

parameters = merge(var.parameters, local.pipeline_options, local.experiment_options)

}
28 changes: 28 additions & 0 deletions modules/dataflow-flex-job/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

output "job_id" {
description = "The unique ID of this job."
value = google_dataflow_flex_template_job.dataflow_flex_template_job.job_id

}

output "state" {
description = "The current state of the resource, selected from the JobState enum. See https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#Job.JobState ."
value = google_dataflow_flex_template_job.dataflow_flex_template_job.state
}


96 changes: 96 additions & 0 deletions modules/dataflow-flex-job/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

variable "project_id" {
description = "The project in which the resource belongs. If it is not provided, the provider project is used."
type = string
}

variable "name" {
description = "The name of the dataflow flex job."
type = string
}

variable "container_spec_gcs_path" {
description = "The GCS path to the Dataflow job flex template."
type = string
}

variable "temp_location" {
description = "GCS path for saving temporary workflow jobs."
type = string
}

variable "staging_location" {
description = "GCS path for staging code packages needed by workers."
type = string
}

variable "job_language" {
description = "Language of the flex template code. Options are 'JAVA' or 'PYTHON'."
type = string
default = "JAVA"
}

variable "enable_streaming_engine" {
description = "Enable/disable the use of Streaming Engine for the job. Note that Streaming Engine is enabled by default for pipelines developed against the Beam SDK for Python v2.21.0 or later when using Python 3."
type = bool
default = true
}

variable "parameters" {
description = "Key/Value pairs to be passed to the Dataflow job (as used in the template)."
type = map(any)
default = {}
}

variable "max_workers" {
description = " The number of workers permitted to work on the job. More workers may improve processing speed at additional cost."
type = number
default = 1
}

variable "on_delete" {
description = "One of drain or cancel. Specifies behavior of deletion during terraform destroy. The default is cancel."
type = string
default = "cancel"
}

variable "region" {
description = "The region in which the created job should run."
type = string
}

variable "service_account_email" {
description = "The Service Account email that will be used to identify the VMs in which the jobs are running"
type = string
}

variable "subnetwork_self_link" {
description = "The subnetwork self link to which VMs will be assigned."
type = string
}

variable "use_public_ips" {
description = "If VMs should used public IPs."
type = string
default = false
}

variable "kms_key_name" {
description = "The name for the Cloud KMS key for the job. Key format is: `projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY`."
type = string
}
52 changes: 52 additions & 0 deletions modules/dataflow-flex-job/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

terraform {
required_version = ">= 0.13"

required_providers {

google = {
source = "hashicorp/google"
version = ">= 3.67"
}

google-beta = {
source = "hashicorp/google-beta"
version = "~> 3.67"
}

null = {
source = "hashicorp/null"
version = "~> 2.1"
}

random = {
source = "hashicorp/random"
version = "~> 2.3"
}

}

provider_meta "google" {
module_name = "blueprints/terraform/terraform-google-secured-data-warehouse:dataflow-flex-job/v1.0.0"
}

provider_meta "google-beta" {
module_name = "blueprints/terraform/terraform-google-secured-data-warehouse:dataflow-flex-job/v1.0.0"
}

}