Skip to content

Conversation

daniyelnnr
Copy link
Contributor

What is the purpose of this pull request?

This pull request introduces OpenTelemetry-based logging and metrics instrumentation to the VTEX I/O API client, providing improved observability and standardizing telemetry collection. Key changes include the addition of OpenTelemetry dependencies, refactoring logger and metrics clients to use a centralized telemetry initializer, and the introduction of new middleware for request metrics. The telemetry initialization now registers instrumentations for Koa and host metrics, and the worker process is updated to include these new middlewares.

Telemetry and Instrumentation Integration
  • Added OpenTelemetry dependencies (@opentelemetry/api, @opentelemetry/host-metrics, @opentelemetry/instrumentation, @opentelemetry/instrumentation-koa) and updated the diagnostics-nodejs package version in package.json to support new telemetry features.
  • Implemented a centralized telemetry initializer (initializeTelemetry) in src/service/telemetry/client.ts, which creates and registers logs, metrics, and traces clients, and sets up Koa and host metrics instrumentations. [1] [2] [3]
Logger and Metrics Refactoring
  • Refactored logger and metrics clients to use the new telemetry initializer, simplifying their interfaces and removing redundant configuration parameters. (src/service/logger/client.ts, src/service/logger/logger.ts, src/service/metrics/client.ts) [1] [2] [3] [4] [5] [6]
  • Added a host metrics instrumentation class and new metrics instruments for tracking HTTP requests, timings, response sizes, and aborted requests. (src/service/metrics/instruments/hostMetrics.ts, src/service/metrics/metrics.ts) [1] [2]
Middleware Enhancements
  • Introduced a new middleware (addOtelRequestMetricsMiddleware) for collecting detailed HTTP request metrics using OpenTelemetry, and integrated it into the worker process alongside context propagation middleware. (src/service/metrics/otelRequestMetricsMiddleware.ts, src/service/worker/index.ts) [1] [2] [3] [4]
Service Initialization
  • Changed service startup to asynchronously initialize telemetry before starting master or worker processes, ensuring telemetry is ready before handling requests. (src/service/index.ts)

What problem is this solving?

This implementation of node-vtex-api is not using the newest version of diagnostics-nodejs, which is VTEX's standard library for observability. This PR provides consistency in telemetry generation and allows it to adopt OTel protocol for metrics.

How should this be manually tested?

Screenshots or example usage

Types of changes

  • Bug fix (a non-breaking change which fixes an issue)
  • New feature (a non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Requires change to documentation, which has been updated accordingly.

Enables this client to support logs, metrics, and tracing signals from the diagnostics-nodejs library, making them available already initialized. Additionally, this change enables registration of built-in and community instrumentation.
Remove unused parameters and use logClient already initialized
Refactor client types to use Types.LogClient for consistency
@daniyelnnr daniyelnnr self-assigned this Aug 8, 2025
@wisneycardeal wisneycardeal changed the title Update metrics instrumentation XTNSNS-1082 Update metrics instrumentation Aug 8, 2025
Improve telemetry client initialization using Promise.all to optimize asynchronous calls
@daniyelnnr daniyelnnr marked this pull request as draft August 14, 2025 13:01
Adds OpenTelemetry dependencies for host metrics and Koa instrumentation
Updates startApp function to initialize telemetry beforehand and uses dynamic imports for startMaster and startWorker. This ensures that telemetry clients and libraries are proper initialized so they can use hooks with application libraries that will be loaded later
Add getMetricClient function to make the client available and creates the asynchronous initialization function to retrieve it
This commit creates a middleware for request metrics using OpenTelemetry instruments. This change is inspired by `requestMetricsMiddleware.ts` and follows the same logic, only changing the way it is instrumented to use OTel standard
Add the use of OpenTelemetry metrics middleware for request monitoring
Add Koa instrumentation to the list of instruments that will be registered and used by the telemetry client. This enables automatic instrumentation for the Koa module, as well as automatic collection and export of telemetry data.
This commit creates a wrapper for the host-metrics module, which provides automatic collection for system metrics - such as CPU, memory, and network
@daniyelnnr daniyelnnr marked this pull request as ready for review August 14, 2025 17:18
@daniyelnnr daniyelnnr requested a review from filafb August 14, 2025 17:18
@juliobguedes
Copy link

Also, after finished, it would be awesome if you could add tests here.

Copy link
Contributor

@filafb filafb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work and way easier to review with the commits organized like that. Thank you for that.

The thing that called most my attention and I think it would be nice a second thought is the variables in the global context. It might lead to some weird bugs hard to get.

return async function addOtelRequestMetrics(ctx: ServiceContext, next: () => Promise<void>) {
if (!instruments) {
try {
instruments = await Promise.race([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting use of Promise.race. I like it!

Refactors the instrument initialization logic to improve readability and code structure
Refactor MetricClient to use Singleton pattern for improved client management
Refactor OTel instruments initialization to use singleton pattern
@daniyelnnr
Copy link
Contributor Author

Also, after finished, it would be awesome if you could add tests here.

@juliobguedes regarding this and exposing what we have previously aligned, I will add the tests for this change in another PR along with some smaller configuration changes that I will still need to apply.

Copy link
Contributor

@filafb filafb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Base automatically changed from chore/bump-diagnostics to master August 29, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants