Remap shared services for edx-edxapp in DD

The problem is that default DD configuration for mysql, django (cache), defaultdb, requests, and potentially other libraries has each of these going to a shared service of the same name that is used across all of our IDA services.

Problems with this default approach:
- It is difficult to determine what span data belongs to which IDA/service.
- There is some metric data that may be impossible to separate between IDAs/services.

**Acceptance Criteria**
- [x] Decide on and build a Proof of Concept for a service naming scheme that associates other spans (e.g. mysql, django (cache), defaultdb, mysql, requests, kafka, etc.) with the edxapp service.
  - [x] Ensure spans do not have multiple service tags (may require SRE support)
  - [x] Ensure we've addressed the complete list of dependent services.
  - [x] Ensure the primary operation of the edxapp services remain as they are (e.g. `django.requests`, etc.).
- [x] Document (possibly in a wiki ADR) why we made the choices we made, and communicate to other service owners. Ideally they can participate in ADR review, so we get a consistent solution across all of our services.
- [x] implement the decided upon service naming scheme for edxapp
  - Approach: Inferred Services
    - [x] Feature switch: https://github.com/edx/configuration/pull/122
    - [x] [stage](https://github.com/edx/edx-internal/pull/12017), [edge](https://github.com/edx/edge-internal/pull/813), [prod](https://github.com/edx/edx-internal/pull/12058)
    - [x] Just make it default to true and remove the per-env settings
    - [x] devstack (`timmc/datadog-local-testing` is updated)
- [x] Offer to other teams
  - [x] Convert ADR into DD docs, explaining how to apply the POC approach to other services
    - https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1008500894/Datadog+configuration+and+setup+details#Global-default-service-name
  - [x] Make available in Helm Charts
    - [x] Configure DD agent in k8s
      - [x] stage: https://github.com/edx/edx-internal/pull/12175
      - [x] edge: https://github.com/edx/edx-internal/pull/12178
      - [x] prod: https://github.com/edx/edx-internal/pull/12179
    - [x] Add config option in django-ida helm chart, defaulting false for now: https://github.com/edx/helm-charts/pull/170
    - Available now with django-ida helm chart 0.10.0
  - [x] Write up a recommendation for other teams
  - [x] Test in k8s
  - [x] Announce broadly as a recommendation to other teams
- [x] File ticket for cleanup: https://github.com/edx/edx-arch-experiments/issues/973

**Notes:**
- DD support ticket https://help.datadoghq.com/hc/en-us/requests/1781912 asks a variety of questions about this topic, specifically targeting django (cache) and requests services. Questions and answers should ultimately be copied out of this ticket once it is complete.
- DD support ticket https://help.datadoghq.com/hc/en-us/requests/1762193 asks similar questions about mysql, but has some additional information about metrics and multiple service tags (which should be avoided). Questions and answers should ultimately be copied out of this ticket once it is complete.
- DD support ticket https://help.datadoghq.com/hc/en-us/requests/1873842 summarizes some of the above as well as other options we've considered, and asks for guidance.
- There is an open (support) question about whether using `DD_SERVICE_MAPPING` is a simpler method of remapping, rather than using a variety of different settings.
  -  However, remapping `django` to `cache` might be risky, because `django` _could_ be used for something else in the future, so in this particular case we might want to use `DD_DJANGO_CACHE_SERVICE_NAME` instead.
- For each service (e.g. mysql, django, etc.) we decide to remap, we need to choose between the IDA service `service:edx-edxapp-lms` (spans would still have a different operation_name), or `service:edx-edxapp-lms-cache` (a new DD service catalog service).
  - If everything went to the same service, we may need to adjust the primary operation_name for `edx-edxapp-lms` if it no longer defaults to `operation_name:django.requests`. See these [DD docs for configuring the primary operation](https://docs.datadoghq.com/tracing/guide/configuring-primary-operation/#configuration).
- If we go with separate DD service names:
  - We would probably use the following naming convention:
    - `service:edx-edxapp-lms`
    - `service:edx-edxapp-lms-cache` (was `django`)
    - `service:edx-edxapp-lms-defaultdb`
    - Etc.
  - Unfortunately, I chose to go with `service:edx-edxapp-lms-workers` rather than `service:edx-edxapp-workers-lms`. 
    - We might need to change this, because presumably we'd want `service:edx-edxapp-workers-lms-cache`, etc., and if we used a search like `service:edx-edxapp-lms*` to pick up all services related to `service:edx-edxapp-lms`, we would not want that to also pick up the worker service and all its sub-services.
    - Should we do this as a separate, pre-emptive ticket? We could use expand/contract to ensure monitors (listed in DD) and dashboards, etc. are updated before the name is changed. Communications will be required.
- ADR should state that we are rejecting the status quo of a shared DD service across all of our IDAs.
- What is the full list of services to address? DD has a dependency graph somewhere, and some other possible services may includes `elasticsearch`, `redis`, `read_replicadb`. Please review in DD to get the full list of affect dependencies. Note that some general shared services (e.g. aws.s3) probably should remain shared, but this could be discussed.

ADRish page: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1265598591/Datadog+Service+mapping

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remap shared services for edx-edxapp in DD #737

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remap shared services for edx-edxapp in DD #737

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions