All tools use Apache Beam pipelines. By default, pipelines run locally using the DirectRunner
. You can optionally
choose to run the pipelines on Google Cloud Dataflow by selection
the DataflowRunner
.
When working with GCP, it's recommended you set the project ID up front with the command:
gcloud config set project <your-id>
--direct_num_workers
: The number of workers to use. We recommend 2 for local development.
Example run:
weather-mv -i gs://netcdf_file.nc \
-o $PROJECT.$DATASET_ID.$TABLE_ID \
-t gs://$BUCKET/tmp \
--direct_num_workers 2
For a full list of how to configure the direct runner, please review this page.
--runner
: ThePipelineRunner
to use. This field can be eitherDirectRunner
orDataflowRunner
. Default:DirectRunner
(local mode)--project
: The project ID for your Google Cloud Project. This is required if you want to run your pipeline using the Dataflow managed service (i.e.DataflowRunner
).--temp_location
: Cloud Storage path for temporary files. Must be a valid Cloud Storage URL, beginning withgs://
.--region
: Specifies a regional endpoint for deploying your Dataflow jobs. Default:us-central1
.--job_name
: The name of the Dataflow job being executed as it appears in Dataflow's jobs list and job details.
Example run:
weather-dl configs/seasonal_forecast_example_config.cfg \
--runner DataflowRunner \
--project $PROJECT \
--region $REGION \
--temp_location gs://$BUCKET/tmp/
For a full list of how to configure the Dataflow pipeline, please review this table.
When running Dataflow, you can monitor jobs through UI, or via Dataflow's CLI commands:
For example, to see all outstanding Dataflow jobs, simply run:
gcloud dataflow jobs list
To describe stats about a particular Dataflow job, run:
gcloud dataflow jobs describe $JOBID
In addition, Dataflow provides a series of Beta CLI commands.
These can be used to keep track of job metrics, like so:
JOBID=<enter job id here>
gcloud beta dataflow metrics list $JOBID --source=user
You can even view logs via the beta commands:
gcloud beta dataflow logs list $JOBID