Skip to content

feat: experimental Glean event #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

feat: experimental Glean event #262

wants to merge 5 commits into from

Conversation

mmiermans
Copy link
Contributor

@mmiermans mmiermans commented Jan 28, 2025

Goal

In support of reporting and ML HNT-414, we need to have corpus metadata available in BigQuery. This PR emits Glean events for corpus items, corresponding to the reviewed_corpus_item Snowplow event.

Deployment steps

References

JIRA ticket: MC-1661

Documentation:

Slack threads:

QA

Glean event:

{
  "Timestamp": 1738105706608000000,
  "Logger": "glean-logger-glean",
  "Type": "glean-server-event",
  "Severity": 6,
  "Pid": 22921,
  "EnvVersion": "2.0",
  "Fields": {
    "document_namespace": "curated-corpus-api",
    "document_type": "events",
    "document_version": "1",
    "document_id": "a4b2b83a-ed30-44c5-9a79-b66671c8d996",
    "user_agent": "unknown",
    "ip_address": "unknown",
    "payload": "{\"metrics\":{},\"events\":[{\"category\":\"curated_corpus\",\"name\":\"reviewed_corpus_item\",\"extra\":{\"object_version\":\"new\",\"approved_corpus_item_external_id\":\"ac314af4-2284-4874-bea1-89419cdbe50f\",\"rejected_corpus_item_external_id\":\"\",\"prospect_id\":\"123-abc\",\"url\":\"https://test.com/docker\",\"loaded_from\":\"\",\"corpus_review_status\":\"corpus\",\"rejection_reasons_json\":\"\",\"action_screen\":\"\",\"title\":\"Find Out How I Cured My Docker In 2 Days\",\"excerpt\":\"A short summary of what this story is about\",\"image_url\":\"https://test.com/image.png\",\"language\":\"DE\",\"topic\":\"TECHNOLOGY\",\"is_collection\":\"false\",\"is_syndicated\":\"false\",\"is_time_sensitive\":\"true\",\"created_at\":\"1738105707\",\"created_by\":\"ad|Mozilla-LDAP|cglazer\",\"updated_at\":\"1738105707\",\"updated_by\":\"\",\"authors_json\":\"[\\\"Mary Shelley\\\"]\",\"publisher\":\"Convective Cloud\",\"experimental_json\":\"\"},\"timestamp\":1738105706608}],\"ping_info\":{\"seq\":0,\"start_time\":\"2025-01-28T23:08:26.608Z\",\"end_time\":\"2025-01-28T23:08:26.608Z\"},\"client_info\":{\"telemetry_sdk_build\":\"glean_parser v14.5.2\",\"first_run_date\":\"Unknown\",\"os\":\"Unknown\",\"os_version\":\"Unknown\",\"architecture\":\"Unknown\",\"app_build\":\"Unknown\",\"app_display_version\":\"1.0.0\",\"app_channel\":\"production\"}}"
  }
}
payload from the above event
  {
    "metrics": {},
    "events": [
      {
        "category": "curated_corpus",
        "name": "reviewed_corpus_item",
        "extra": {
          "object_version": "new",
          "approved_corpus_item_external_id": "ac314af4-2284-4874-bea1-89419cdbe50f",
          "rejected_corpus_item_external_id": "",
          "prospect_id": "123-abc",
          "url": "https://test.com/docker",
          "loaded_from": "",
          "corpus_review_status": "corpus",
          "rejection_reasons_json": "",
          "action_screen": "",
          "title": "Find Out How I Cured My Docker In 2 Days",
          "excerpt": "A short summary of what this story is about",
          "image_url": "https://test.com/image.png",
          "language": "DE",
          "topic": "TECHNOLOGY",
          "is_collection": "false",
          "is_syndicated": "false",
          "is_time_sensitive": "true",
          "created_at": "1738105707",
          "created_by": "ad|Mozilla-LDAP|cglazer",
          "updated_at": "1738105707",
          "updated_by": "",
          "authors_json": "[\"Mary Shelley\"]",
          "publisher": "Convective Cloud",
          "experimental_json": ""
        },
        "timestamp": 1738105706608
      }
    ],
    "ping_info": {
      "seq": 0,
      "start_time": "2025-01-28T23:08:26.608Z",
      "end_time": "2025-01-28T23:08:26.608Z"
    },
    "client_info": {
      "telemetry_sdk_build": "glean_parser v14.5.2",
      "first_run_date": "Unknown",
      "os": "Unknown",
      "os_version": "Unknown",
      "architecture": "Unknown",
      "app_build": "Unknown",
      "app_display_version": "1.0.0",
      "app_channel": "production"
    }
  }

Copy link

github-actions bot commented Jan 28, 2025

Plan Result (prospect-translation-lambda-cdk-production)

CI link

Plan: 0 to add, 1 to change, 0 to destroy.
  • Update
    • aws_lambda_function.translation-lambda_translation-sqs-lambda_B9BDF6BA
Change Result (Click me)
  # aws_lambda_function.translation-lambda_translation-sqs-lambda_B9BDF6BA will be updated in-place
  ~ resource "aws_lambda_function" "translation-lambda_translation-sqs-lambda_B9BDF6BA" {
        id                             = "ProspectAPI-Prod-Sqs-Translation-Function"
        tags                           = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI-Sqs-Translation"
        }
        # (22 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              ~ "GIT_SHA"                      = (sensitive value)
                # (4 unchanged elements hidden)
            }
        }

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

⚠️ Errors

Copy link

github-actions bot commented Jan 28, 2025

Plan Result (prospect-api-cdk-production)

CI link

Plan: 0 to add, 2 to change, 0 to destroy.
  • Update
    • aws_dynamodb_table.dynamodb_prospects_dynamodb_table_9854E41E
    • aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6
Change Result (Click me)
  # data.aws_iam_policy_document.application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_iam_policy_document" "application_ecs_service_ecs-iam_data-ecs-task-role-policy_090CC3AD" {
      + id            = (known after apply)
      + json          = (known after apply)
      + minified_json = (known after apply)
      + version       = "2012-10-17"

      + statement {
          + actions   = [
              + "dynamodb:BatchGet*",
              + "dynamodb:DescribeTable",
              + "dynamodb:Get*",
              + "dynamodb:Query",
              + "dynamodb:Scan",
              + "dynamodb:UpdateItem",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
              + "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
            ]
        }
      + statement {
          + actions   = [
              + "s3:*",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:s3:::pocket-prospectapi-prod-images",
              + "arn:aws:s3:::pocket-prospectapi-prod-images/*",
            ]
        }
      + statement {
          + actions   = [
              + "events:PutEvents",
            ]
          + effect    = "Allow"
          + resources = [
              + "arn:aws:events:us-east-1:996905175585:event-bus/PocketEventBridge-Prod-Shared-Event-Bus",
            ]
        }
      + statement {
          + actions   = [
              + "logs:CreateLogGroup",
              + "logs:CreateLogStream",
              + "logs:DescribeLogGroups",
              + "logs:DescribeLogStreams",
              + "logs:PutLogEvents",
            ]
          + effect    = "Allow"
          + resources = [
              + "*",
            ]
        }
    }

  # aws_dynamodb_table.dynamodb_prospects_dynamodb_table_9854E41E will be updated in-place
  ~ resource "aws_dynamodb_table" "dynamodb_prospects_dynamodb_table_9854E41E" {
        id                          = "PROAPI-Prod-Prospects"
        name                        = "PROAPI-Prod-Prospects"
        tags                        = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI"
        }
        # (9 unchanged attributes hidden)

      - global_secondary_index {
          - hash_key           = "scheduledSurfaceGuid" -> null
          - name               = "scheduledSurfaceGuid-prospectType" -> null
          - non_key_attributes = [] -> null
          - projection_type    = "ALL" -> null
          - range_key          = "prospectType" -> null
          - read_capacity      = 0 -> null
          - write_capacity     = 0 -> null
        }
      + global_secondary_index {
          + hash_key           = "scheduledSurfaceGuid"
          + name               = "scheduledSurfaceGuid-prospectType"
          + non_key_attributes = []
          + projection_type    = "ALL"
          + range_key          = "prospectType"
          + read_capacity      = 5
          + write_capacity     = 5
        }

        # (5 unchanged blocks hidden)
    }

  # aws_iam_policy.application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6 will be updated in-place
  ~ resource "aws_iam_policy" "application_ecs_service_ecs-iam_ecs-task-role-policy_6FC89FB6" {
        id               = "arn:aws:iam::996905175585:policy/ProspectAPI-Prod-TaskRolePolicy"
        name             = "ProspectAPI-Prod-TaskRolePolicy"
      ~ policy           = jsonencode(
            {
              - Statement = [
                  - {
                      - Action   = [
                          - "dynamodb:UpdateItem",
                          - "dynamodb:Scan",
                          - "dynamodb:Query",
                          - "dynamodb:Get*",
                          - "dynamodb:DescribeTable",
                          - "dynamodb:BatchGet*",
                        ]
                      - Effect   = "Allow"
                      - Resource = [
                          - "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects/*",
                          - "arn:aws:dynamodb:us-east-1:996905175585:table/PROAPI-Prod-Prospects",
                        ]
                    },
                  - {
                      - Action   = "s3:*"
                      - Effect   = "Allow"
                      - Resource = [
                          - "arn:aws:s3:::pocket-prospectapi-prod-images/*",
                          - "arn:aws:s3:::pocket-prospectapi-prod-images",
                        ]
                    },
                  - {
                      - Action   = "events:PutEvents"
                      - Effect   = "Allow"
                      - Resource = "arn:aws:events:us-east-1:996905175585:event-bus/PocketEventBridge-Prod-Shared-Event-Bus"
                    },
                  - {
                      - Action   = [
                          - "logs:PutLogEvents",
                          - "logs:DescribeLogStreams",
                          - "logs:DescribeLogGroups",
                          - "logs:CreateLogStream",
                          - "logs:CreateLogGroup",
                        ]
                      - Effect   = "Allow"
                      - Resource = "*"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> (known after apply)
        tags             = {
            "app_code"       = "content"
            "component_code" = "content-prospectapi"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "ProspectAPI"
        }
        # (5 unchanged attributes hidden)
    }

Plan: 0 to add, 2 to change, 0 to destroy.

Copy link

github-actions bot commented Jan 28, 2025

Plan Result (corpus-scheduler-lambda-cdk-production)

CI link

Plan: 0 to add, 1 to change, 0 to destroy.
  • Update
    • aws_lambda_function.corpus-scheduler-sqs-lambda_F2ECDF9F
Change Result (Click me)
  # aws_lambda_function.corpus-scheduler-sqs-lambda_F2ECDF9F will be updated in-place
  ~ resource "aws_lambda_function" "corpus-scheduler-sqs-lambda_F2ECDF9F" {
        id                             = "CorpusSchedulerLambda-Prod-SQS-Function"
      ~ qualified_arn                  = "arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:207" -> (known after apply)
      ~ qualified_invoke_arn           = "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:996905175585:function:CorpusSchedulerLambda-Prod-SQS-Function:207/invocations" -> (known after apply)
        tags                           = {
            "app_code"       = "content"
            "component_code" = "content-corpusschedulerlambda"
            "env_code"       = "prod"
            "environment"    = "Prod"
            "service"        = "CorpusSchedulerLambda"
        }
      ~ version                        = "207" -> (known after apply)
        # (20 unchanged attributes hidden)

      ~ environment {
          ~ variables = {
              ~ "GIT_SHA"                          = (sensitive value)
                # (7 unchanged elements hidden)
            }
        }

        # (4 unchanged blocks hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

⚠️ Errors

experimental_json: '',
};
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve readability, I'd split this up in separate functions for approved and rejected extra data.

description: "Curator who last updated the item."
authors_json:
type: string
description: "JSON-encoded list of authors."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkatre Glean only supports string and boolean for server-side events. Is JSON-encoding a good way to store a list of authors for processing in BigQuery-ETL?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't support JSON object metrics in server templates yet, but string should be possible to parse in BQ.

description: "Curator who created the item."
updated_at:
type: string
description: "Unix timestamp (in seconds) for last item update."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glean events support 3 types:

type: The type of value this extra key can hold. One of string, boolean, quantity. Defaults to string. Recorded value is converted to string for transmission. Note: If not specified only the legacy API on record is available.

However, currently quantity doesn't seem to be supported by the Glean parser for server-side events.

@gkatre @akkomar Is string or quantity better for timestamps? If quantity is preferred, I can see if it's feasible to support it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an extra key which gets converted to a string for transport anyway (see https://mozilla.github.io/glean/book/reference/metrics/event.html#events).

I think in order to support quantity we'd need to have proper type conversion in https://github.com/mozilla/glean_parser/blob/main/glean_parser/templates/javascript_server.jinja2#L227 via https://github.com/mozilla/glean_parser/blob/main/glean_parser/javascript_server.py#L63-L64 like in Go parser. I can look into this sometime next week, or feel free take a stab if you want and have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants