Skip to content

DEBUG-3700 Telemetry integration test #4588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

p-datadog
Copy link
Member

What does this PR do?

Adds integration tests for agent and agentless telemetry

Motivation:
Test coverage for upcoming code changes to the core transport within telemetry

Change log entry
None

Additional Notes:
I added a wait to the termination code for async worker, without it the workers were not terminated within the test's scope and the thread leak checker was complaining

How to test the change?
PR is tests only

@p-datadog p-datadog requested a review from a team as a code owner April 16, 2025 14:52
@p-datadog p-datadog force-pushed the telemetry-integration-test branch from f4ba5a2 to c67d2b3 Compare April 16, 2025 14:52
@github-actions github-actions bot added the core Involves Datadog core libraries label Apr 16, 2025
@pr-commenter
Copy link

pr-commenter bot commented Apr 16, 2025

Benchmarks

Benchmark execution time: 2025-04-23 18:03:40

Comparing candidate commit d1e9ac4 in PR branch telemetry-integration-test with baseline commit 5de3de1 in branch master.

Found 3 performance improvements and 0 performance regressions! Performance is the same for 28 metrics, 2 unstable metrics.

scenario:tracing - 100 span trace - no writer

  • 🟩 throughput [+17.465op/s; +18.062op/s] or [+5.640%; +5.833%]

scenario:tracing - Propagation - Datadog

  • 🟩 throughput [+3075.705op/s; +3149.385op/s] or [+10.645%; +10.900%]

scenario:tracing - Tracing.log_correlation

  • 🟩 throughput [+10511.171op/s; +10803.684op/s] or [+10.519%; +10.812%]

@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Apr 16, 2025

Datadog Report

Branch report: telemetry-integration-test
Commit report: d1e9ac4
Test service: dd-trace-rb

❌ 2 Failed (0 Known Flaky), 20477 Passed, 1363 Skipped, 3m 45.36s Total Time

❌ Failed Tests (2)

  • Telemetry integration tests agentful first run sends startup payloads - rspec - Details

    Expand for error
     expected: 2
          got: 0
     
     (compared using ==)
     
     Failure/Error: expect(sent_payloads.length).to eq 2
     
       expected: 2
            got: 0
     
     ...
    
  • Telemetry integration tests agentful not first run sends expected payload - rspec - Details

    Expand for error
     expected: 1
          got: 0
     
     (compared using ==)
     
     Failure/Error: expect(sent_payloads.length).to eq 1
     
       expected: 1
            got: 0
     
     ...
    

@p-datadog p-datadog marked this pull request as draft April 17, 2025 15:13
@p-datadog p-datadog force-pushed the telemetry-integration-test branch from 9cc84c0 to d1e9ac4 Compare April 23, 2025 17:40
Copy link
Member

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have few questions about new code and few suggestions about the tests

it 'sends expected payload' do
component.error('test error')

sleep 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way for us to force flush instead of sleeping? Sleeping could be flaky


let(:sent_payloads) { [] }

shared_examples 'telemetry' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find shared examples are tedious to debug and also to read the test and personally I'm against such DRY for the sake of DRY. But not a blocker

P.S Yes, I would prefer copy-paste here, more obvious is better.

end
end

context 'agentful' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor note here, I think a description in a way of sentence would be great adjustment, like

context 'when setup is using standard agent mode' do

or so

@@ -0,0 +1,191 @@
require 'spec_helper'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
require 'spec_helper'
# frozen_string_literal: true
require 'spec_helper'

@@ -47,6 +47,14 @@ def terminate
@run_async = false
Datadog.logger.debug { "Forcibly terminating worker thread for: #{self}" }
worker.terminate
# Wait for the worker thread to end
begin
Timeout.timeout(0.5) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have few questions here.

  1. We wait some time and fail with exception, and then continue as always, but what's the difference? The debug log?
  2. Why don't we check is worker running instead of waiting till complete? Just to avoid sleep?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants