-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Problem
We do not have a recommended way to make K8s logs available to the logs view in the Netdata UI.
Description
As we are continually improving our logging support, our customers are asking for supporting logging use cases beyond native logging into systemd journal and Windows Event Log. This internal PoC is to define a recommended way to provide access to K8s container logs in clear defined phases.
Definition of Done
The definition of done for each phase is:
- Any required components (Netdata plugins, collectors, etc. and external tools) must be available for installation by a user. For Netdata Agent that means that it must be in a released (nightly) version.
- A working helm configuration to deploy.
- Any configuration and installation steps are documented in Learn. If this depends on a nightly version, the document should reflect that.
Concerns
- Ease of install
- Ease of configuration
- Robustness of the setup
- Ability to retain access to logs even when a given k8s node is no longer available.
Importance
must have
Value proposition
- Recommended way for users to see k8s container logs
Proposed implementation
Phases
Phase 1: no OTel, no centralization
Constraints
- Without depending on otel.plugin and the Netdata Distribution of OpenTelemetry Collector (NDOC)
- No logs centralization
Suggestions:
-
A DeamonSet that:
- Tails the JSON container logs in
/var/log/containers/*.log
- pass them through
log2journal json
to extract all JSON fields and emit a journal entry - pass them through
systemd-cat-native --namespace k8s
to index into the local journal
- Tails the JSON container logs in
-
???
Phase 2: centralized logging server
Constraints
- Centralized logging with the Netdata parent
Suggestions
-
systemd journal remote pod
- Create a pod that's co-located with the parent Agent to run the
systemd-journal-remote
service. - Point the DeamonSet pod of Phase 1 to this new pod.
- Consider whether this needs host networking
- Create a pod that's co-located with the parent Agent to run the
-
Local logging and then forwarding
The idea here is to make it analogous to ingesting metrics in a child Agent and then streaming it to the Parent. The upshot of it is that logs from nodes that get shutdown don't get lost, including all of the non-container logs. The downside is more complexity.
- Requires the systemd journal remote pod from 1.
- Also requires the local systemd journal to forward its logs to the centralized logging setup for each k8s node
Phase 3: NDOC
Constraints
- Use the Netdata Distribution of OpenTelemetry Collector (NDOC) instead of the custom DaemonSet for tailing logs.
- Dynamic Configuration of NDOC is not in scope
Suggestions
-
Hook up NDOC and otel.plugin
- Using otel.plugin for ingesting OpenTelemetry logs coming into its GPRC endpoint into systemd journal
- Using the Netdata Distribution of OpenTelemetry Collector (NDOC) to create a pipeline that:
- Tails logs using the
filelog
receiver. - Exports them to otel.plugin using the
otlp
exporter. - Possibly uses the
batch
processor
- Tails logs using the
- NDOC should be launched by the Agent, so everything is in the same pod
Stretch goals:
- Having NDOC pipeline configuration in the UI using dynamic configuration