-
Notifications
You must be signed in to change notification settings - Fork 15
Update logging instrumentation #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/service/logger/client.ts
Outdated
const telemetryClient = await getTelemetryClient(); | ||
|
||
const logsConfig = Exporters.CreateLogsExporterConfig({ | ||
endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://io-telemetry-http.vtex.systems:3333', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't identify the proper way to handle this address, at first I thought it should be in constants.ts but I'm not sure about that. Any ideas about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we should have a default value for the exporter endpoint. Maybe if we can't initialize it with a environment variable, this instance just doesn't send logs anywhere (or logs them to stdout as a default). I'm afraid of a behaviour where for some reason, if the endpoints are unavailable, this client crashes the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CreateLogsExporterConfig function receives an object with some content on it to generate the exporter config, but all these fields are optional as we can see in the function signature - and internally it can handle the scenario where an endpoint or any other field is unavailable.
So if we have problems with the environment variable and/or don't proper inform the collector endpoint, the worst case scenario we'll get an functional exporter that sends logs to an invalid collector due to misconfiguration.
But yet, I'm not sure how I should proper inform the endpoint of the collector here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I wouldn't have a hard-coded default endpoint, since—as you showed—the CreateLogsExporterConfig
can handle the undefined
case. That exposes internal information in a public lib and also creates a config that is hard to modify: we need to redeploy the runtime if want to change it.
In summary, I would only get rid of the endpoint and let it be process.env.OTEL_EXPORTER_OTPL_ENDPOINT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arturpimentel I agree 100% about removing the hard-coded endpoint and will commit a change to fix this. However, I'm not sure where I could define the process.env.OTEL_EXPORTER_OTLP_ENDPOINT
, as we don't want to set this in this lib. I imagine this info need to be defined somewhere else - is service-runtime-node the right place to do it?
let cached: void | Cached = undefined | ||
try { | ||
const cacheHasWithSegment = await storage.has(keyWithSegment) | ||
cached = cacheHasWithSegment ? await storage.get(keyWithSegment) : await storage.get(key) | ||
} catch (error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will probably conflict with the changes I made myself in #574. I believe I like your implementation better, so I might keep the changes you made here in the future.
src/service/logger/client.ts
Outdated
const telemetryClient = await getTelemetryClient(); | ||
|
||
const logsConfig = Exporters.CreateLogsExporterConfig({ | ||
endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://io-telemetry-http.vtex.systems:3333', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we should have a default value for the exporter endpoint. Maybe if we can't initialize it with a environment variable, this instance just doesn't send logs anywhere (or logs them to stdout as a default). I'm afraid of a behaviour where for some reason, if the endpoints are unavailable, this client crashes the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a high-level overview of the changes? I had some trouble understanding the logging architecture here. Maybe I'll use copilot for this, but I think it would be beneficial if I could picture how this is organized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this work materializing 🎉 added a few comments/questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request updates the logging instrumentation by integrating diagnostics and telemetry clients that support OpenTelemetry protocols. The changes include introducing asynchronous initialization for telemetry and log clients, updating various catch blocks with explicit type annotations, and adjusting cache behavior by replacing undefined with a null cast fallback.
Reviewed Changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
src/service/worker/runtime/events/middlewares/body.ts | Adjusted error catch type and import order for clarity. |
src/service/worker/index.ts | Updated LogLevel import to use the new loggerTypes module. |
src/service/tracing/tracingMiddlewares.ts | Added explicit catch type annotations. |
src/service/telemetry/index.ts | Added an export for telemetry client functionality. |
src/service/telemetry/client.ts | Introduced a TelemetryClientSingleton for managing telemetry client initialization. |
src/service/logger/loggerTypes.ts | Added new logger type definitions. |
src/service/logger/logger.ts | Refactored logger to use a new diagnostics-based log client with asynchronous initialization. |
src/service/logger/index.ts | Exported additional modules including loggerTypes and client. |
src/service/logger/console.ts | Adjusted LogLevel import to reference loggerTypes. |
src/service/logger/client.ts | Implemented a new log client initialization using the diagnostics library. |
src/service/index.ts, src/clients/infra/, src/HttpClient/middlewares/ | Updated catch clauses with explicit type annotations. |
src/caches/LRUDiskCache.ts, src/caches/DiskCache.ts | Replaced returning undefined with a null cast for fallback values. |
Files not reviewed (1)
- package.json: Language not supported
Comments suppressed due to low confidence (1)
src/caches/DiskCache.ts:46
- Returning a casted null instead of undefined for a missing cache entry may cause inconsistencies if downstream code expects undefined. Consider reverting to undefined or ensuring type consistency across the cache API.
resolve(null as unknown as V)
@arturpimentel I followed your idea and used copilot to get an overview of the logging architecture - if you think it makes sense, I can add a summary of all the changes as well. Original Logging Architecture
Architecture after the changes
Here is a more detailed overview: flowchart LR
A[Application Code] --[Logger Instance]--> B["Diagnostics Logger /src/service/logger/client.ts"]
M[Trace/Span Context] -.-> B
Z["Diagnostics Metrics [TBD]"] -.-> X
Y["Diagnostics Traces [TBD]"] -.-> X
B --> X
X[Telemetry Client]--> C["OpenTelemetry Logs (@opentelemetry/sdk-logs)"]
D["Resource & Attributes (Context enrichment)"] --> C
E --> F["o11y Collector
(io-telemetry-http.vtex.systems)"]
subgraph Unified Telemetry
C --> E["OTLP Exporter (gRPC/HTTP)"]
C --> P["stdout Exporter (gRPC/HTTP)"]
G["OpenTelemetry Metrics (@opentelemetry/sdk-metrics)"] --> E
G["OpenTelemetry Metrics (@opentelemetry/sdk-metrics)"] --> P
H["OpenTelemetry Traces (@opentelemetry/sdk-trace)"] --> E
H["OpenTelemetry Traces (@opentelemetry/sdk-trace)"] --> P
end
|
Adds telemetry and log clients, integrating node-vtex-api with the new version of the diagnostics library that supports OpenTelemetry protocols. It also updates Typescript and Opentelemetry dependencies to enable support for node builder 6.x and IO apps using Node 16.x.
This update includes a timeout to handle clients initializing indefinitely, assuming some initial logs might be lost on application startup while the log client still loading. Now we can better handle this scenario since we have more control over the startup process of the log client. We've also updated error handling for this step, avoiding throwing errors on log client startup so the app is still available and able to use console.log fallback - ensuring the app is able to run even if the logger is not working properly.
To maintain the current standard for collecting and sending logs, we will temporarily disable the creation of this client.
a804047
to
470d89b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What is the purpose of this pull request?
Update logging instrumentation, adding telemetry and log clients and integrates
node-vtex-api
with the new version of the diagnostics library that supports OpenTelemetry protocols.It also updates Typescript and Opentelemetry dependencies to enable support for node builder 6.x and IO apps using Node 16.x.
What problem is this solving?
This implementation of
node-vtex-api
is not using the newest version of diagnostics, which is VTEX's standard library for instrumentation and observability. This PR provides consistency in telemetry generation and allows it to adopt OTel protocol for logs.How should this be manually tested?
To test it:
update/logs-instrumentation
branch;Types of changes