Skip to content

Remap shared services for edx-edxapp in DD #737

Closed
@robrap

Description

@robrap

The problem is that default DD configuration for mysql, django (cache), defaultdb, requests, and potentially other libraries has each of these going to a shared service of the same name that is used across all of our IDA services.

Problems with this default approach:

  • It is difficult to determine what span data belongs to which IDA/service.
  • There is some metric data that may be impossible to separate between IDAs/services.

Acceptance Criteria

Notes:

  • DD support ticket https://help.datadoghq.com/hc/en-us/requests/1781912 asks a variety of questions about this topic, specifically targeting django (cache) and requests services. Questions and answers should ultimately be copied out of this ticket once it is complete.
  • DD support ticket https://help.datadoghq.com/hc/en-us/requests/1762193 asks similar questions about mysql, but has some additional information about metrics and multiple service tags (which should be avoided). Questions and answers should ultimately be copied out of this ticket once it is complete.
  • DD support ticket https://help.datadoghq.com/hc/en-us/requests/1873842 summarizes some of the above as well as other options we've considered, and asks for guidance.
  • There is an open (support) question about whether using DD_SERVICE_MAPPING is a simpler method of remapping, rather than using a variety of different settings.
    • However, remapping django to cache might be risky, because django could be used for something else in the future, so in this particular case we might want to use DD_DJANGO_CACHE_SERVICE_NAME instead.
  • For each service (e.g. mysql, django, etc.) we decide to remap, we need to choose between the IDA service service:edx-edxapp-lms (spans would still have a different operation_name), or service:edx-edxapp-lms-cache (a new DD service catalog service).
    • If everything went to the same service, we may need to adjust the primary operation_name for edx-edxapp-lms if it no longer defaults to operation_name:django.requests. See these DD docs for configuring the primary operation.
  • If we go with separate DD service names:
    • We would probably use the following naming convention:
      • service:edx-edxapp-lms
      • service:edx-edxapp-lms-cache (was django)
      • service:edx-edxapp-lms-defaultdb
      • Etc.
    • Unfortunately, I chose to go with service:edx-edxapp-lms-workers rather than service:edx-edxapp-workers-lms.
      • We might need to change this, because presumably we'd want service:edx-edxapp-workers-lms-cache, etc., and if we used a search like service:edx-edxapp-lms* to pick up all services related to service:edx-edxapp-lms, we would not want that to also pick up the worker service and all its sub-services.
      • Should we do this as a separate, pre-emptive ticket? We could use expand/contract to ensure monitors (listed in DD) and dashboards, etc. are updated before the name is changed. Communications will be required.
  • ADR should state that we are rejecting the status quo of a shared DD service across all of our IDAs.
  • What is the full list of services to address? DD has a dependency graph somewhere, and some other possible services may includes elasticsearch, redis, read_replicadb. Please review in DD to get the full list of affect dependencies. Note that some general shared services (e.g. aws.s3) probably should remain shared, but this could be discussed.

ADRish page: https://2u-internal.atlassian.net/wiki/spaces/ENG/pages/1265598591/Datadog+Service+mapping

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions