Skip to content

Add metric ibm_mq.channel.conns to ibm mq integration, add channel and connection metric tests #20519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

mwdd146980
Copy link
Contributor

@mwdd146980 mwdd146980 commented Jun 15, 2025

What does this PR do?

This PR enhances the IBM MQ integration by adding two new metrics to provide better visibility into channel connections: ibm_mq.channel.conn_status (tracks individual connection status with connection name tags) and ibm_mq.channel.connections_active (counts total active connections per channel). These metrics enable customers to monitor individual channel connections, track connection counts per channel, and improve troubleshooting capabilities for IBM MQ connectivity issues.

Configuration Control: To address potential tag cardinality concerns, this PR introduces a new configuration option collect_connection_metrics (default: false) that allows users to control the collection of the ibm_mq.channel.conn_status metric. When enabled, this metric creates a new connection tag for each unique connection, which can lead to high cardinality in environments with many active connections. The option is disabled by default to prevent unintended tag cardinality issues.

Additionally, this PR significantly enhances the test coverage for channel and connection metrics by:

  1. Adding comprehensive unit tests for channel connection handling in test_channel_metric_collector.py
  2. Testing various connection scenarios including:
    • Channels with active connections
    • Channels without connections
    • Channels with empty connection strings
      Proper tagging of connection metrics
  3. Ensuring proper metric collection and tagging for the new connection metric
  4. Verifying that connection metrics are properly aggregated and reported

Motivation

The motivation behind this PR is to enhance the monitoring capabilities of the IBM MQ integration by providing visibility into active connections per channel. This feature allows users to track connection changes over time and identify which connections are active, which is crucial for maintaining the health and performance of the messaging system.

This was requested in escalation AGENT-13489/FRAGENT-3166 by customer Broadridge (GTO) (org ID: 345886).

Manual QA Steps

  • Spin up an EC2 VM with AMI Ubu-ddev-docker (required for the correct architecture)
  • Install IBM MQ server and client libraries
  • Run pytest tests/test_ibm_mq_unit.py -v for unit tests
  • Run ddev --no-interactive test ibm_mq for unit tests with ddev containers
  • Spin up containers with ddev env start ibm_mq py3.12-9-cluster --dev
  • Run the manual check with with ddev env agent ibm_mq py3.12-9-cluster check and check for the new metrics in the output
  • Simulate connections with this Python script
    • Connections can be simulated with a command like python simulate_mq_conn.py QM1 DEV.ADMIN.SVRCONN localhost 11414 APP.QUEUE.1 "Conn 1"
  • Check in Datadog for the metrics

Here's what it looked like in my own testing:
image

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • [not applicable] If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@mwdd146980 mwdd146980 force-pushed the mwdd146980/fragent-3166-channel-conns-metric branch from 8fe0745 to 290b3e5 Compare June 16, 2025 03:35
Copy link

codecov bot commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 97.46835% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.82%. Comparing base (0df4650) to head (49445b4).

Additional details and impacted files
Flag Coverage Δ
active_directory ?
activemq ?
activemq_xml ?
aerospike ?
airflow ?
amazon_msk ?
ambari ?
apache ?
appgate_sdp ?
arangodb ?
argo_rollouts ?
argo_workflows ?
argocd ?
aspdotnet ?
avi_vantage ?
aws_neuron ?
azure_iot_edge ?
boundary ?
btrfs ?
cacti ?
calico ?
cassandra ?
cassandra_nodetool ?
celery ?
ceph ?
cert_manager ?
cilium ?
cisco_aci ?
citrix_hypervisor ?
clickhouse ?
cloud_foundry_api ?
cloudera ?
cockroachdb ?
confluent_platform ?
consul ?
coredns ?
couch ?
couchbase ?
crio ?
datadog_checks_base ?
datadog_checks_dev ?
datadog_checks_downloader ?
datadog_cluster_agent ?
dcgm ?
ddev ?
directory ?
disk ?
dns_check ?
dotnetclr ?
druid ?
duckdb ?
ecs_fargate ?
eks_fargate ?
elastic ?
envoy ?
esxi ?
etcd ?
exchange_server ?
external_dns ?
falco ?
fluentd ?
fluxcd ?
fly_io ?
foundationdb ?
gearmand ?
gitlab ?
gitlab_runner ?
glusterfs ?
go_expvar ?
gunicorn ?
haproxy ?
harbor ?
hazelcast ?
hdfs_datanode ?
hdfs_namenode ?
hive ?
hivemq ?
http_check ?
hudi ?
ibm_ace ?
ibm_db2 ?
ibm_i ?
ibm_mq 91.76% <97.46%> (+0.42%) ⬆️
ibm_was ?
ignite ?
iis ?
impala ?
infiniband ?
istio ?
jboss_wildfly ?
kafka ?
kafka_consumer ?
karpenter ?
keda ?
kong ?
kube_apiserver_metrics ?
kube_controller_manager ?
kube_dns ?
kube_metrics_server ?
kube_proxy ?
kube_scheduler ?
kubeflow ?
kubelet ?
kubernetes_cluster_autoscaler ?
kubernetes_state ?
kubevirt_api ?
kubevirt_controller ?
kubevirt_handler ?
kuma ?
kyototycoon ?
kyverno ?
lighttpd ?
linkerd ?
linux_proc_extras ?
litellm ?
mac_audit_logs ?
mapr ?
mapreduce ?
marathon ?
marklogic ?
mcache ?
mesos_master ?
milvus ?
mongo ?
mysql ?
nagios ?
network ?
nfsstat ?
nginx ?
nginx_ingress_controller ?
nvidia_nim ?
nvidia_triton ?
octopus_deploy ?
openldap ?
openmetrics ?
openstack ?
openstack_controller ?
pdh_check ?
pgbouncer ?
php_fpm ?
postfix ?
postgres ?
powerdns_recursor ?
presto ?
process ?
prometheus ?
proxysql ?
pulsar ?
quarkus ?
rabbitmq ?
ray ?
redisdb ?
rethinkdb ?
riak ?
riakcs ?
sap_hana ?
scylla ?
silk ?
silverstripe_cms ?
singlestore ?
slurm ?
snmp ?
snowflake ?
solr ?
sonarqube ?
sonatype_nexus ?
spark ?
sqlserver ?
squid ?
ssh_check ?
statsd ?
strimzi ?
supabase ?
supervisord ?
system_core ?
system_swap ?
tcp_check ?
teamcity ?
tekton ?
teleport ?
temporal ?
teradata ?
tibco_ems ?
tls ?
tomcat ?
torchserve ?
traefik_mesh ?
traffic_server ?
twemproxy ?
twistlock ?
varnish ?
vault ?
velero ?
vertica ?
vllm ?
voltdb ?
vsphere ?
weaviate ?
weblogic ?
win32_event_log ?
windows_performance_counters ?
windows_service ?
wmi_check ?
yarn ?
zk ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Update channel metric collector to use channel_status_metrics() instead of channel_metrics()
  for discovered channels to properly collect buffers_rcvd metric
- Update test assertions to match actual tags being sent in gauge calls
- Fix unit tests in test_channel_metric_collector.py to pass

This change ensures that channel status metrics like buffers_rcvd are properly
collected and reported by the integration.
- Updated get_pcf_channel_metrics to submit configuration metrics instead of status metrics.
- Modified unit tests to verify that configuration metrics are collected for channels with empty or no connections.
- Added a new test test_channel_status_metrics to ensure status metrics and connection metrics are correctly submitted.
buraizu
buraizu previously approved these changes Jun 17, 2025
- create new metric ibm_mq.channel.connections_active which represents total num of active conns per channel
@temporal-github-worker-1 temporal-github-worker-1 bot dismissed buraizu’s stale review June 26, 2025 23:15

Review from buraizu is dismissed. Related teams and files:

  • documentation
    • ibm_mq/metadata.csv
@mwdd146980 mwdd146980 force-pushed the mwdd146980/fragent-3166-channel-conns-metric branch from 5add087 to 3a5f791 Compare June 27, 2025 03:15
@mwdd146980 mwdd146980 requested review from buraizu and steveny91 June 30, 2025 17:29
buraizu
buraizu previously approved these changes Jun 30, 2025
…annel.conn_status is only collected if this flag is enabled
@temporal-github-worker-1 temporal-github-worker-1 bot dismissed buraizu’s stale review June 30, 2025 22:05

Review from buraizu is dismissed. Related teams and files:

  • documentation
    • ibm_mq/assets/configuration/spec.yaml
    • ibm_mq/datadog_checks/ibm_mq/data/conf.yaml.example
@mwdd146980 mwdd146980 requested a review from buraizu June 30, 2025 22:13
buraizu
buraizu previously approved these changes Jul 1, 2025
Copy link
Contributor

@buraizu buraizu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a minor update requested

@temporal-github-worker-1 temporal-github-worker-1 bot dismissed buraizu’s stale review July 1, 2025 21:41

Review from buraizu is dismissed. Related teams and files:

  • documentation
    • ibm_mq/assets/configuration/spec.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants