Skip to content

Update logging instrumentation #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

daniyelnnr
Copy link

What is the purpose of this pull request?

Update logging instrumentation, adding telemetry and log clients and integrates node-vtex-api with the new version of the diagnostics library that supports OpenTelemetry protocols.
It also updates Typescript and Opentelemetry dependencies to enable support for node builder 6.x and IO apps using Node 16.x.

What problem is this solving?

This implementation of node-vtex-api is not using the newest version of diagnostics, which is VTEX's standard library for instrumentation and observability. This PR provides consistency in telemetry generation and allows it to adopt OTel protocol for logs.

How should this be manually tested?

To test it:

  • Clone this repository and checkout to update/logs-instrumentation branch;
  • Install the dependencies and generate the build:
  yarn && yarn build
  • Link the previously generated package:
  yarn link
  • Create an IO app (e.g. service-example)
  • Link this package to your app's node_modules
  cd ../<your-app>/node && yarn link @vtex/api
  • Add some logging to your app and monitor new logs being received by the collector through this dashboard in Grafana.

Types of changes

  • Bug fix (a non-breaking change which fixes an issue)
  • New feature (a non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Requires change to documentation, which has been updated accordingly.

const telemetryClient = await getTelemetryClient();

const logsConfig = Exporters.CreateLogsExporterConfig({
endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://io-telemetry-http.vtex.systems:3333',
Copy link
Author

@daniyelnnr daniyelnnr Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't identify the proper way to handle this address, at first I thought it should be in constants.ts but I'm not sure about that. Any ideas about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should have a default value for the exporter endpoint. Maybe if we can't initialize it with a environment variable, this instance just doesn't send logs anywhere (or logs them to stdout as a default). I'm afraid of a behaviour where for some reason, if the endpoints are unavailable, this client crashes the application.

Copy link
Author

@daniyelnnr daniyelnnr Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CreateLogsExporterConfig function receives an object with some content on it to generate the exporter config, but all these fields are optional as we can see in the function signature - and internally it can handle the scenario where an endpoint or any other field is unavailable.

So if we have problems with the environment variable and/or don't proper inform the collector endpoint, the worst case scenario we'll get an functional exporter that sends logs to an invalid collector due to misconfiguration.

But yet, I'm not sure how I should proper inform the endpoint of the collector here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I wouldn't have a hard-coded default endpoint, since—as you showed—the CreateLogsExporterConfig can handle the undefined case. That exposes internal information in a public lib and also creates a config that is hard to modify: we need to redeploy the runtime if want to change it.

In summary, I would only get rid of the endpoint and let it be process.env.OTEL_EXPORTER_OTPL_ENDPOINT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arturpimentel I agree 100% about removing the hard-coded endpoint and will commit a change to fix this. However, I'm not sure where I could define the process.env.OTEL_EXPORTER_OTLP_ENDPOINT, as we don't want to set this in this lib. I imagine this info need to be defined somewhere else - is service-runtime-node the right place to do it?

Comment on lines +112 to -116
let cached: void | Cached = undefined
try {
const cacheHasWithSegment = await storage.has(keyWithSegment)
cached = cacheHasWithSegment ? await storage.get(keyWithSegment) : await storage.get(key)
} catch (error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will probably conflict with the changes I made myself in #574. I believe I like your implementation better, so I might keep the changes you made here in the future.

const telemetryClient = await getTelemetryClient();

const logsConfig = Exporters.CreateLogsExporterConfig({
endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://io-telemetry-http.vtex.systems:3333',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should have a default value for the exporter endpoint. Maybe if we can't initialize it with a environment variable, this instance just doesn't send logs anywhere (or logs them to stdout as a default). I'm afraid of a behaviour where for some reason, if the endpoints are unavailable, this client crashes the application.

Copy link
Contributor

@arturpimentel arturpimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a high-level overview of the changes? I had some trouble understanding the logging architecture here. Maybe I'll use copilot for this, but I think it would be beneficial if I could picture how this is organized

Copy link

@silvadenisaraujo silvadenisaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this work materializing 🎉 added a few comments/questions.

@silvadenisaraujo silvadenisaraujo requested a review from Copilot April 2, 2025 22:06
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request updates the logging instrumentation by integrating diagnostics and telemetry clients that support OpenTelemetry protocols. The changes include introducing asynchronous initialization for telemetry and log clients, updating various catch blocks with explicit type annotations, and adjusting cache behavior by replacing undefined with a null cast fallback.

Reviewed Changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/service/worker/runtime/events/middlewares/body.ts Adjusted error catch type and import order for clarity.
src/service/worker/index.ts Updated LogLevel import to use the new loggerTypes module.
src/service/tracing/tracingMiddlewares.ts Added explicit catch type annotations.
src/service/telemetry/index.ts Added an export for telemetry client functionality.
src/service/telemetry/client.ts Introduced a TelemetryClientSingleton for managing telemetry client initialization.
src/service/logger/loggerTypes.ts Added new logger type definitions.
src/service/logger/logger.ts Refactored logger to use a new diagnostics-based log client with asynchronous initialization.
src/service/logger/index.ts Exported additional modules including loggerTypes and client.
src/service/logger/console.ts Adjusted LogLevel import to reference loggerTypes.
src/service/logger/client.ts Implemented a new log client initialization using the diagnostics library.
src/service/index.ts, src/clients/infra/, src/HttpClient/middlewares/ Updated catch clauses with explicit type annotations.
src/caches/LRUDiskCache.ts, src/caches/DiskCache.ts Replaced returning undefined with a null cast for fallback values.
Files not reviewed (1)
  • package.json: Language not supported
Comments suppressed due to low confidence (1)

src/caches/DiskCache.ts:46

  • Returning a casted null instead of undefined for a missing cache entry may cause inconsistencies if downstream code expects undefined. Consider reverting to undefined or ensuring type consistency across the cache API.
resolve(null as unknown as V)

@daniyelnnr
Copy link
Author

Would you mind adding a high-level overview of the changes? I had some trouble understanding the logging architecture here. Maybe I'll use copilot for this, but I think it would be beneficial if I could picture how this is organized

@arturpimentel I followed your idea and used copilot to get an overview of the logging architecture - if you think it makes sense, I can add a summary of all the changes as well.

Original Logging Architecture

  1. Proprietary logger with a specific VTEX format
  2. Logs treated as separate entities from metrics and traces
  3. Logs sent directly to stdout/stderr
[Application] → [VTEX Logger] → [stdout/stderr] → [Fluent bit] → [Log Collector]

Architecture after the changes

  1. Logs follow the OpenTelemetry standard format
  2. Logs automatically correlated with traces and spans
  3. Logs exported via the OTLP standard
  4. Centralization of exporter configuration
  5. Context information and metadata propagated automatically.
  6. Logs automatically enriched with trace IDs and span IDs.
[Application] → [Diagnostics Logger] → [OpenTelemetry SDK] → [OTLP Exporter] → [Telemetry Backend/Collector]

Here is a more detailed overview:

flowchart LR
    A[Application Code] --[Logger Instance]--> B["Diagnostics Logger /src/service/logger/client.ts"]
    M[Trace/Span Context] -.-> B
    Z["Diagnostics Metrics [TBD]"] -.-> X
    Y["Diagnostics Traces [TBD]"] -.-> X
    
    B --> X
    X[Telemetry Client]--> C["OpenTelemetry Logs (@opentelemetry/sdk-logs)"]
    D["Resource & Attributes (Context enrichment)"] --> C
    
    
    E --> F["o11y Collector
    (io-telemetry-http.vtex.systems)"]
    
    subgraph Unified Telemetry
        C --> E["OTLP Exporter (gRPC/HTTP)"]
        C --> P["stdout Exporter (gRPC/HTTP)"]
        G["OpenTelemetry Metrics (@opentelemetry/sdk-metrics)"] --> E
        G["OpenTelemetry Metrics (@opentelemetry/sdk-metrics)"] --> P
        H["OpenTelemetry Traces (@opentelemetry/sdk-trace)"] --> E
        H["OpenTelemetry Traces (@opentelemetry/sdk-trace)"] --> P
    end
    
Loading

Adds telemetry and log clients, integrating node-vtex-api with the new version of the diagnostics library that supports OpenTelemetry protocols.
It also updates Typescript and Opentelemetry dependencies to enable support for node builder 6.x and IO apps using Node 16.x.
This update includes a timeout to handle clients initializing indefinitely, assuming some initial logs might be lost on application startup while the log client still loading. Now we can better handle this scenario since we have more control over the startup process of the log client.
We've also updated error handling for this step, avoiding throwing errors on log client startup so the app is still available and able to use console.log fallback - ensuring the app is able to run even if the logger is not working properly.
To maintain the current standard for collecting and sending logs, we will temporarily disable the creation of this client.
@daniyelnnr daniyelnnr force-pushed the update/logs-instrumentation branch from a804047 to 470d89b Compare April 16, 2025 13:03
Copy link

@silvadenisaraujo silvadenisaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants