diff --git a/README.md b/README.md index 17ecfaea21..b98a101df6 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ Common solutions and tools developed by Google Cloud's Professional Services tea ## Examples The examples folder contains example solutions across a variety of Google Cloud Platform products. Use these solutions as a reference for your own or extend them to fit your particular use case. + * [Audio Content Profiling](examples/ml-audio-content-profiling) - A tool that builds a pipeline to scale the process of moderating audio files for inappropriate content using machine learning APIs. * [BigQuery Audit Log Dashboard](examples/bigquery-audit-log) - Solution to help audit BigQuery usage using Data Studio for visualization and a sample SQL script to query the back-end data source consisting of audit logs. * [BigQuery Billing Dashboard](examples/bigquery-billing-dashboard) - Solution to help displaying billing info using Data Studio for visualization and a sample SQL script to query the back-end billing export table in BigQuery. diff --git a/examples/ml-audio-content-profiling/app/angular/README.md b/examples/ml-audio-content-profiling/app/angular/README.md deleted file mode 100644 index fd8d952b2d..0000000000 --- a/examples/ml-audio-content-profiling/app/angular/README.md +++ /dev/null @@ -1,19 +0,0 @@ -## Code scaffolding - -Run `ng generate component component-name` to generate a new component. You can also use `ng generate directive|pipe|service|class|guard|interface|enum|module`. - -## Build - -Run `ng build` to build the project. The build artifacts will be stored in the `dist/` directory. Use the `--prod` flag for a production build. - -## Running unit tests - -Run `ng test` to execute the unit tests via [Karma](https://karma-runner.github.io). - -## Running end-to-end tests - -Run `ng e2e` to execute the end-to-end tests via [Protractor](http://www.protractortest.org/). - -## Further help - -To get more help on the Angular CLI use `ng help` or go check out the [Angular CLI README](https://github.com/angular/angular-cli/blob/master/README.md). diff --git a/tools/gce-usage-log/README.md b/tools/gce-usage-log/README.md index 5be9b55093..f959561ece 100644 --- a/tools/gce-usage-log/README.md +++ b/tools/gce-usage-log/README.md @@ -4,6 +4,10 @@ This project is designed to provide you with tools to capture an ongoing record As your GCP organization grows, you may want to understand the business context of your overall GCE instance footprint. An accounting of your GCE resource usage can be analyzed to optimize autoscaling strategies, to aid in capacity forecasting, and to assist in your internal resource accounting. Further insights can be drawn by segmenting your fleet based on labels or network tags (to represent entities such as production environment or team). + +Pre-requisites: The schema from audit logs requires your VMs to include both the labels and tags fields when creating the individual resource. + + ## 1. Overview This project will capture events relevant to your GCE instance usage and log them in BigQuery in way that surfaces your GCE vCPUs (cores), RAM, and attached persistent (standard and SSD) or scratch (local SSD) disks, sliceable by zone, project, network tags, labels, and whether the instance was preemptible. @@ -20,13 +24,24 @@ The audit logs will have separate entries for the creation and deletion of an in A process is run to capture your existing footprint into a table (`_initial_vm_inventory`) in the same BigQuery dataset. This is required to capture the state of running instances for which a `create` instance event has not already been logged. -### 1.3 BigQuery View +### 1.3.1 BigQuery Base View A view is created which joins the audit log and initial VM inventory tables to provide a more user-friendly view (`_gce_usage_log`), calculating cores and RAM from the machine type listed in the audit log events. The resulting schema of the view looks like this: -![view schema](images/view-schema.png) +![view schema](images/base-view-schema.png) + + +### 1.3.2 BigQuery Interval View + +An additional view can be created to also visualize point-in-time VM inventory (`_gce_usage_log_interval`). This displays +the inventory on a specified time-interval, such as aggregating all VMs in hourly increments. + +The resulting schema of this interval view looks like this: + +![view interval schema](images/interval-view-schema.png) + ### 1.4 Component Architecture @@ -157,11 +172,11 @@ This process will run and create a new BigQuery table called _`initial_vm_invent If you’ve decided not to run in the cloud shell, you may need to install `maven` yourself. -### 2.5 Creating the BigQuery view +### 2.5 Creating the BigQuery Views Now we have the data required to calculate an aggregate view of our GCE usage. Let’s create a BigQuery view to make the data more friendly. -#### 2.5.1 Create the BigQuery view +#### 2.5.1 Create the Initial BigQuery View Let’s create the view from the `gce_usage_view.sql` file in the repository. @@ -182,23 +197,124 @@ bq mk \ gce_usage_log._gce_usage_log ``` +#### 2.5.2 Create a Time-Series View + +First, enable the Data Transfer API. + +```` +gcloud services enable bigquerydatatransfer.googleapis.com +```` + +Next, configure in your relevant variables: +```` +export DESTINATION_TABLE=_gce_usage_log_interval +export TIME_INTERVAL_UNIT=your_interval_unit +export TIME_INTERVAL_AMOUNT=your_interval_amount +```` +where: + +* `your_table` is the desired name for where the result will live +* `your_interval_unit` is DAY, HOUR, MINUTE, SECOND or any [supported date format](https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions#timestamp_trunc) +* `your_interval_amount` is any integer representing the frequency of intervals to calculate. +For example, if you wanted to create a time-series dataset looking at the inventory of VMs on an +hourly cadence, you would choose `HOUR` for `TIME_INTERVAL_UNIT` and `1` for `TIME_INTERVAL_AMOUNT`. +If you wanted to see the inventory at 5-minute increments, you would choose `MINUTE` and 5, respectively. + + +Next, create a reference to the interval view: + +``` +export INTERVAL_VIEW=$(cat gce_interval_view.sql | sed -e "s/_PROJECT_/$PROJECT_ID/g" -e "s/_TIME_INTERVAL_UNIT_/$TIME_INTERVAL_UNIT/g" -e "s/_TIME_INTERVAL_AMOUNT_/$TIME_INTERVAL_AMOUNT/g") +``` + +Now, create the view. +```bash +bq query \ +--project $PROJECT_ID \ +--use_legacy_sql=false \ +--destination_table=gce_usage_log.$DESTINATION_TABLE \ +--display_name="Interval usage of GCE Usage Logs" \ +--replace=true \ +--schedule='every 24 hours' "$INTERVAL_VIEW" +Note: As a default, it is configured to run every 24 hours, but you can scheduled it more/less frequently as needed. +There is also no default expiration set, but this can be added if you only need historical data from a certain timeframe. + + ## 3. Using the dataset +### 3.2.1 How to Use the Interval View + Now that your dataset is ready, how do you query it? +The interval view allows to see aggregated point-in-time statistics. To find resource usage for a specific time-frame, you can query for the aggregate usage information for the month of +September by using the sample query below as an example. You can adjust the timeframe by altering the WHERE clause +or choose to only select certain fields depending on what you are trying to predict. + + +```sql +SELECT + custom_interval as hour, + count(instance_id) as num_instances, + SUM(cores) as total_cores, + SUM(memory_mb) as total_memory_mb, + SUM(pd_standard_size_gb) as total_pd_standard_size_gb, + SUM(pd_ssd_size_gb) as total_pd_ssd_size_gb, + SUM(local_ssd_size_gb) as total_local_ssd_size_gb + +FROM `gce_usage_log._gce_usage_log_interval` + +WHERE + custom_interval >= "2019-09-01" AND custom_interval < "2019-10-01" + +GROUP BY 1 +``` + +The results will look something like this (Note that the aggregated statistics will most likely +vary for workloads as VM resources change over hour, but they do not in this example.) + +![interval_query](images/interval-query-results.png) + +### 3.2.2 How to Create a Time-Series Graph on the Interval View + +If you want to see the same data in a time-series graph rather than a data table, you can do this +in Data Studio. This allows you to create time-series graphs to monitor changes and spikes of inventory +over time. This can be done on whichever metrics that your team would like to use for capacity planning, +such as looking at the total count of instances over time, cores, memory, etc. + + +1. Open up DataStudio and create a copy of [this data source](https://datastudio.google.com/datasources/c4ed9ce7-50a2-4045-8de1-859ed5aaac6f) +by selecting the copy button. +![copy data source](images/copy-data-source.png) +2. Rename the Data Source to the name that you'd like. Click on 'Edit Connection'. +3. If this is your first time using Data Studio, click 'Authorize'. +4. Fill in your project name. +5. Select your dataset `gce_usage_log`. +6. Select `gce_usage_log_interval`, or the corresponding name if you named the view something differently. +7. Click 'Reconnect' in the upper right-hand corner. +8. Make a copy of the [report](https://datastudio.google.com/open/1mpyXSxvkuu3PWXf1j0rzzyhiYP0qC_jR). +![copy report](images/copy-report.png) +9. When prompted to choose your data source, select your newly created data source. +10. Click on 'Create Report' and name yours accordingly, including any other metrics to analyze. +11. View the graph +![graph](images/usage-graph.png) + +### 3.3 How to Query Base View + + To find resource usage at a point in time `t`, query the view for records that were inserted before `t`, and deleted after `t` (or not deleted yet). Here we also group by project. -```bash +```sql SELECT project_id, count(instance_id) as num_instances, SUM(cores) as total_cores, SUM(memory_mb) as total_memory_mb, SUM(pd_standard_size_gb) as total_pd_standard_size_gb, - SUM(pd_ssd_size_gb) as total_pd_ssd_size_gb + SUM(pd_ssd_size_gb) as total_pd_ssd_size_gb, SUM(local_ssd_size_gb) as total_local_ssd_size_gb + FROM `gce_usage_log._gce_usage_log` WHERE inserted < '2019-08-23 08:00:00.000 UTC' diff --git a/tools/gce-usage-log/gce_interval_view.sql b/tools/gce-usage-log/gce_interval_view.sql new file mode 100644 index 0000000000..86bb1cb25a --- /dev/null +++ b/tools/gce-usage-log/gce_interval_view.sql @@ -0,0 +1,31 @@ +WITH + timestamp_interval_table AS ( + SELECT + instance_id, + GENERATE_TIMESTAMP_ARRAY(TIMESTAMP_TRUNC(inserted, _TIME_INTERVAL_UNIT_), + TIMESTAMP_TRUNC(IFNULL(deleted, + CURRENT_TIMESTAMP()), _TIME_INTERVAL_UNIT_), + INTERVAL _TIME_INTERVAL_AMOUNT_ _TIME_INTERVAL_UNIT_) AS custom_interval_array + FROM + `_PROJECT_.gce_usage_log._gce_usage_log`) +SELECT + timestamp_interval_table.instance_id, + custom_interval, + preemptible, + project_id, + zone, + machine_type, + cores, + memory_mb, + pd_standard_size_gb, + pd_ssd_size_gb, + tags, + labels +FROM + timestamp_interval_table, + UNNEST(custom_interval_array) AS custom_interval +JOIN + `_PROJECT_.gce_usage_log._gce_usage_log` usage_view +ON + usage_view.instance_id = timestamp_interval_table.instance_id +ORDER BY custom_interval ASC diff --git a/tools/gce-usage-log/gce_usage_view.sql b/tools/gce-usage-log/gce_usage_view.sql index 4d5c926271..74b6fbe683 100644 --- a/tools/gce-usage-log/gce_usage_view.sql +++ b/tools/gce-usage-log/gce_usage_view.sql @@ -1,19 +1,3 @@ -/* - * Copyright 2019 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - #standardSQL SELECT inserted, diff --git a/tools/gce-usage-log/images/view-schema.png b/tools/gce-usage-log/images/base-view-schema.png similarity index 100% rename from tools/gce-usage-log/images/view-schema.png rename to tools/gce-usage-log/images/base-view-schema.png diff --git a/tools/gce-usage-log/images/copy-data-source.png b/tools/gce-usage-log/images/copy-data-source.png new file mode 100644 index 0000000000..034b841087 Binary files /dev/null and b/tools/gce-usage-log/images/copy-data-source.png differ diff --git a/tools/gce-usage-log/images/copy-report.png b/tools/gce-usage-log/images/copy-report.png new file mode 100644 index 0000000000..b730f99992 Binary files /dev/null and b/tools/gce-usage-log/images/copy-report.png differ diff --git a/tools/gce-usage-log/images/custom-interval-dimension.png b/tools/gce-usage-log/images/custom-interval-dimension.png new file mode 100644 index 0000000000..1c6855b224 Binary files /dev/null and b/tools/gce-usage-log/images/custom-interval-dimension.png differ diff --git a/tools/gce-usage-log/images/gce-usage-log.png b/tools/gce-usage-log/images/gce-usage-log.png index 078a687ca9..dd2e7614cf 100644 Binary files a/tools/gce-usage-log/images/gce-usage-log.png and b/tools/gce-usage-log/images/gce-usage-log.png differ diff --git a/tools/gce-usage-log/images/interval-query-results.png b/tools/gce-usage-log/images/interval-query-results.png new file mode 100644 index 0000000000..951c9a8a79 Binary files /dev/null and b/tools/gce-usage-log/images/interval-query-results.png differ diff --git a/tools/gce-usage-log/images/interval-view-schema.png b/tools/gce-usage-log/images/interval-view-schema.png new file mode 100644 index 0000000000..601d876348 Binary files /dev/null and b/tools/gce-usage-log/images/interval-view-schema.png differ diff --git a/tools/gce-usage-log/images/time-series-graph.png b/tools/gce-usage-log/images/time-series-graph.png new file mode 100644 index 0000000000..7af6a01260 Binary files /dev/null and b/tools/gce-usage-log/images/time-series-graph.png differ diff --git a/tools/gce-usage-log/images/usage-graph.png b/tools/gce-usage-log/images/usage-graph.png new file mode 100644 index 0000000000..7d4b4e13ed Binary files /dev/null and b/tools/gce-usage-log/images/usage-graph.png differ