Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ instant-acme = { version = "0.8.4", default-features = false, features = ["hyper
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls-manual-roots", "json"] }
x509-parser = "0.18.0"
http = "1.4.0"
axum = "0.8.8"
10 changes: 10 additions & 0 deletions NETWORKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,16 @@ name = "app-network"
POSTGRES_DB = "mydb"
```

### Communicating with the Host

In some cases, containers need to communicate with services running on the host machine (such as the Dispenser Telemetry Ingestion Service).

Dispenser automatically configures `host.docker.internal` to resolve to the host's gateway IP for all managed containers using the `host-gateway` mapping.

**Example:**
To reach a service running on the host at port `4318`:
`http://host.docker.internal:4318`

### Network Isolation

Use internal networks to isolate sensitive services:
Expand Down
79 changes: 75 additions & 4 deletions TELEMETRY.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Telemetry Configuration

Dispenser includes a built-in, high-performance telemetry system powered by [Delta Lake](https://delta.io/). It allows you to automatically collect deployment events and container health status, writing them directly to data lakes (S3, GCS, Azure) or local filesystems in Parquet format.
Dispenser includes a built-in, high-performance telemetry system powered by [Delta Lake](https://delta.io/). It allows you to automatically collect deployment events, container health status, application logs/traces, and raw container output, writing them directly to data lakes (S3, GCS, Azure) or local filesystems in Parquet format.

## Overview

The telemetry system runs in a dedicated, isolated thread to ensure that heavy I/O operations never block the main orchestration loop. It provides:

1. **Deployment Tracking**: Every time a container is created, updated, or restarted, a detailed event is logged.
2. **Health Monitoring**: Periodically samples the status of all managed containers (CPU, memory, uptime, health checks).
3. **Delta Lake Integration**: Writes data using the Delta Lake protocol, enabling ACID transactions, scalable metadata handling, and direct compatibility with tools like Spark, Trino, Athena, and Databricks.
3. **Application Telemetry (OTLP)**: Ingests structured logs and traces from services using standard OpenTelemetry SDKs.
4. **Container Output**: Captures raw `stdout` and `stderr` streams from all managed containers with sequence-guaranteed ordering.
5. **Delta Lake Integration**: Writes data using the Delta Lake protocol, enabling ACID transactions, scalable metadata handling, and direct compatibility with tools like Spark, Trino, Athena, and Databricks.

## Configuration

Expand All @@ -24,6 +26,9 @@ enabled = true
# Supported schemes: file://, s3://, gs://, az://, adls://
table_uri_deployments = "s3://my-data-lake/dispenser/deployments"
table_uri_status = "s3://my-data-lake/dispenser/status"
table_uri_logs = "s3://my-data-lake/dispenser/logs"
table_uri_traces = "s3://my-data-lake/dispenser/traces"
table_uri_container_output = "s3://my-data-lake/dispenser/container-output"

# Optional: How often to sample container status (default: 60 seconds)
status_interval = 60
Expand Down Expand Up @@ -69,9 +74,25 @@ Dispenser supports several authentication methods via environment variables:
* Service Principal: `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`, `AZURE_TENANT_ID`.
* Managed Identity (if running on Azure VMs/AKS).

## OpenTelemetry (OTLP) Ingestion

Dispenser acts as a sidecar host for your services. When telemetry is enabled, Dispenser starts an **Ingestion Service** listening on.

* **Endpoint**: `http://0.0.0.0:4318`
* **Protocol**: OTLP/HTTP (JSON)

### Automatic Environment Variables

Dispenser automatically injects the following environment variables into all managed containers to simplify instrumentation:

* `OTEL_EXPORTER_OTLP_ENDPOINT="http://host.docker.internal:4318"`
* `OTEL_SERVICE_NAME="{service_name}"` (The name from your `service.toml`)

Standard OTel SDKs will automatically detect these variables and begin shipping logs and traces to Dispenser without further configuration.

## Data Schemas

Dispenser automatically manages two Delta tables. It will create them if they do not exist.
Dispenser automatically manages several Delta tables. It will create them if they do not exist.

### Deployments Table (`dispenser-deployments`)

Expand Down Expand Up @@ -116,6 +137,56 @@ Records periodic snapshots of the runtime state of containers.
| `failing_streak` | `INTEGER` | Consecutive healthcheck failures. |
| `last_health_output` | `STRING` | Output of the last failed healthcheck (truncated). |

### Logs Table (`dispenser-logs`)

Stores structured logs emitted by applications via OTel.

| Column | Type | Description |
| :--- | :--- | :--- |
| `date` | `DATE` | Partition column (UTC). |
| `timestamp` | `TIMESTAMP` | Exact time of the log entry. |
| `service` | `STRING` | Service name. |
| `severity` | `STRING` | INFO, WARN, ERROR, etc. |
| `body` | `STRING` | The log message. |
| `trace_id` | `STRING` | Associated trace ID (hex). |
| `span_id` | `STRING` | Associated span ID (hex). |
| `attributes` | `MAP<STRING, STRING>` | Flattened log attributes. |
| `resource` | `MAP<STRING, STRING>` | Resource attributes (pod, node, etc). |

### Traces Table (`dispenser-traces`)

Stores distributed tracing spans.

| Column | Type | Description |
| :--- | :--- | :--- |
| `date` | `DATE` | Partition column. |
| `trace_id` | `STRING` | Trace ID (32-char hex). |
| `span_id` | `STRING` | Span ID (16-char hex). |
| `parent_span_id` | `STRING` | Parent Span ID. |
| `name` | `STRING` | Span name (e.g., "GET /api/users"). |
| `kind` | `STRING` | SERVER, CLIENT, PRODUCER, etc. |
| `start_time` | `TIMESTAMP` | Start time. |
| `end_time` | `TIMESTAMP` | End time. |
| `duration_ms` | `LONG` | Calculated duration. |
| `status_code` | `STRING` | OK, ERROR. |
| `status_message` | `STRING` | Error description. |
| `service` | `STRING` | Service name. |
| `attributes` | `MAP<STRING, STRING>` | Span attributes. |

### Container Output Table (`dispenser-container-output`)

Captures raw `stdout` and `stderr` streams.

| Column | Type | Description |
| :--- | :--- | :--- |
| `date` | `DATE` | Partition column. |
| `timestamp` | `TIMESTAMP` | Exact time of the log line. |
| `service` | `STRING` | Service name. |
| `container_id` | `STRING` | Full container ID. |
| `stream` | `STRING` | `stdout` or `stderr`. |
| `message` | `STRING` | The raw log line. |
| `sequence` | `LONG` | Monotonically increasing counter for perfect ordering. |

## Performance Tuning

### Buffering & Latency
Expand Down Expand Up @@ -144,4 +215,4 @@ The telemetry service runs on a dedicated Tokio runtime spawned in a separate OS
To prevent indefinite storage growth, Dispenser applies the following default retention policies during table creation:

* **Log Retention**: 30 days (Deployments), 7 days (Status). Delta log history is kept for time-travel queries.
* **Deleted Files**: 7 days (Deployments), 1 day (Status). Vacuum operations can reclaim space after this period.
* **Deleted Files**: 7 days (Deployments), 1 day (Status). Vacuum operations can reclaim space after this period.
8 changes: 8 additions & 0 deletions example/dispenser.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ delay = 60
[proxy]
enabled = true

[telemetry]
enabled = true
table_uri_deployments = "telem/deployments"
table_uri_status = "telem/status"
table_uri_logs = "telem/logs"
table_uri_container_output = "telem/container_out"
table_uri_traces = "telem/traces"

[[service]]
path = "service1"

Expand Down
Loading