Skip to content

Conversation

@sabrenner
Copy link
Collaborator

@sabrenner sabrenner commented Jan 15, 2026

What does this PR do?

Adds support for manually instrumenting prompts from the LLM Observability SDK. Additionally, uses this new logic in the tagger in the existing OpenAI auto-instrumentation, which previously annotated prompts manually.

Motivation

Feature parity with the Python SDK.

MLOB-5073

Testing

Ran these agains recently-merged system tests locally:

dd-trace-js git:(sabrenner/llmobs-prompts-support) cd ../system-tests 
system-tests git:(sabrenner/llmobs-prompts) ./run.sh PARAMETRIC -L nodejs -vv tests/parametric/test_llm_observability.py::Test_Prompts
Build framework test container...
Build complete
==================================================================== test context =====================================================================
Scenario: PARAMETRIC
Logs folder: ./logs_parametric
Library: [email protected]
================================================================= test session starts =================================================================
[gw0] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw1] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw2] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw3] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw4] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw5] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw6] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw7] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw8] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw9] darwin Python 3.12.7 cwd: /Users/sam.brenner/dd/system-tests           
[gw3] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]
[gw2] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]
[gw0] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]
[gw6] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]
[gw5] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]
[gw8] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]
[gw9] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]
[gw1] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)] 
[gw4] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]   
[gw7] Python 3.12.7 (main, Oct 21 2024, 09:45:23) [Clang 15.0.0 (clang-1500.3.9.4)]     
gw0 [8] / gw1 [8] / gw2 [8] / gw3 [8] / gw4 [8] / gw5 [8] / gw6 [8] / gw7 [8] / gw8 [8] / gw9 [8]
scheduling tests via LoadScheduling

tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_default_id 
tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_supports_hallucinations 
tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_with_string_template 
tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_supports_tags 
tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_with_non_llm_span_does_not_annotate 
tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_in_annotation_context 
tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation 
tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_updates_existing_prompt 
[gw8] [ 12%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_in_annotation_context 
[gw9] [ 25%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_default_id 
[gw1] [ 37%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_updates_existing_prompt 
[gw5] [ 50%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_supports_hallucinations 
[gw3] [ 62%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation 
[gw6] [ 75%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_supports_tags 
[gw0] [ 87%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_with_string_template 
[gw2] [100%] XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_with_non_llm_span_does_not_annotate 

------------------------------- generated xml file: /Users/sam.brenner/dd/system-tests/logs_parametric/reportJunit.xml --------------------------------
=============================================================== short test summary info ===============================================================
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_in_annotation_context missing_feature
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_default_id missing_feature
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_updates_existing_prompt missing_feature
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_supports_hallucinations missing_feature
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation missing_feature
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_supports_tags missing_feature
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_with_string_template missing_feature
XPASS tests/parametric/test_llm_observability.py::Test_Prompts::test_prompt_annotation_with_non_llm_span_does_not_annotate missing_feature
================================================================= 8 xpassed in 19.23s =================================================================

@github-actions
Copy link

github-actions bot commented Jan 15, 2026

Overall package size

Self size: 4.4 MB
Deduped: 5.23 MB
No deduping: 5.23 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 2.0.0 | 68.46 kB | 797.03 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 57.30337% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.08%. Comparing base (39c85a4) to head (63c3724).

Files with missing lines Patch % Lines
packages/dd-trace/src/llmobs/tagger.js 55.00% 36 Missing ⚠️
packages/dd-trace/src/llmobs/sdk.js 66.66% 1 Missing ⚠️
packages/dd-trace/src/llmobs/span_processor.js 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7257      +/-   ##
==========================================
- Coverage   85.19%   85.08%   -0.12%     
==========================================
  Files         532      532              
  Lines       22778    22863      +85     
==========================================
+ Hits        19405    19452      +47     
- Misses       3373     3411      +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-official

This comment has been minimized.

@pr-commenter
Copy link

pr-commenter bot commented Jan 15, 2026

Benchmarks

Benchmark execution time: 2026-01-16 18:49:08

Comparing candidate commit 63c3724 in PR branch sabrenner/llmobs-prompts-support with baseline commit 39c85a4 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 230 metrics, 30 unstable metrics.

@sabrenner sabrenner marked this pull request as ready for review January 16, 2026 18:54
@sabrenner sabrenner requested review from a team as code owners January 16, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants