Skip to content

Monitoring

Truong Le Vinh Phuc edited this page Jun 21, 2025 · 1 revision

Monitoring and Observability

This page describes the monitoring and observability setup for the Cloud-Native E-commerce Platform.

Monitoring Stack

Our platform uses a comprehensive monitoring stack to ensure visibility into system performance, errors, and behavior:

  • Elasticsearch: Log storage and indexing
  • Kibana: Log visualization and analysis
  • Prometheus: Metrics collection
  • Grafana: Metrics visualization and dashboards
  • Jaeger: Distributed tracing
  • Kiali: Service mesh visualization (when using Istio)

Accessing Monitoring Tools

After deployment, you can access the monitoring tools at:

Tool URL Credentials
Prometheus http://localhost:9090 -
Grafana http://localhost:3000 admin/prom-operator
Kibana http://localhost:5601 -
Jaeger http://localhost:16686 -
Kiali http://localhost:20001 -

Logging

Logging Infrastructure

We use the Elastic Stack (ELK) for centralized logging:

  1. Application logs: Generated by services using Serilog
  2. Log shipping: Logs are sent to Elasticsearch
  3. Log storage: Elasticsearch stores and indexes logs
  4. Log visualization: Kibana provides dashboards and search

Log Configuration

Each microservice is configured to use Serilog with structured logging:

// Program.cs
public static IHostBuilder CreateHostBuilder(string[] args) =>
    Host.CreateDefaultBuilder(args)
        .UseSerilog((context, config) =>
        {
            config
                .ReadFrom.Configuration(context.Configuration)
                .Enrich.FromLogContext()
                .Enrich.WithMachineName()
                .WriteTo.Console()
                .WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri(context.Configuration["ElasticConfiguration:Uri"]))
                {
                    AutoRegisterTemplate = true,
                    IndexFormat = $"{context.Configuration["ApplicationName"]}-logs-{context.HostingEnvironment.EnvironmentName?.ToLower().Replace(".", "-")}-{DateTime.UtcNow:yyyy-MM}"
                });
        });

Log Levels

We use the following log levels:

  • Verbose: Detailed debugging information
  • Debug: Debugging information
  • Information: General information
  • Warning: Non-critical issues
  • Error: Errors that need attention
  • Fatal: Critical errors that cause application failure

Viewing Logs in Kibana

  1. Open Kibana at http://localhost:5601
  2. Navigate to "Discover" in the left sidebar
  3. Create an index pattern matching your service logs
  4. Use the search bar to filter logs by service, level, or content

Metrics

Metrics Collection

We use Prometheus for metrics collection:

  1. Application metrics: Exposed by services using Prometheus .NET Client
  2. Infrastructure metrics: Collected by Prometheus Node Exporter
  3. Kubernetes metrics: Collected by kube-state-metrics

Key Metrics

We monitor the following key metrics:

  • Request Rate: Requests per second by service
  • Error Rate: Errors per second by service
  • Latency: Response time percentiles (p50, p90, p99)
  • CPU Usage: CPU usage by service and node
  • Memory Usage: Memory usage by service and node
  • Disk Usage: Disk usage by node
  • Network Traffic: Network I/O by service and node

Prometheus Configuration

Prometheus is configured to scrape metrics from:

  • Kubernetes API server
  • Kubernetes nodes
  • Microservices (via annotations)
  • Service mesh (when using Istio)

Viewing Metrics in Prometheus

  1. Open Prometheus at http://localhost:9090
  2. Use the "Expression" field to query metrics
  3. View graphs or tables of results

Grafana Dashboards

We provide pre-configured Grafana dashboards:

  1. Platform Overview: High-level system health
  2. Microservices: Detailed service metrics
  3. Node Resources: Infrastructure metrics
  4. API Gateway: Gateway-specific metrics
  5. Database Performance: Database metrics

To access dashboards:

  1. Open Grafana at http://localhost:3000
  2. Log in with admin/prom-operator
  3. Navigate to "Dashboards" in the left sidebar

Distributed Tracing

Tracing Infrastructure

We use Jaeger for distributed tracing:

  1. Trace generation: Services use OpenTelemetry to generate traces
  2. Trace collection: Jaeger Collector receives traces
  3. Trace storage: Jaeger stores traces
  4. Trace visualization: Jaeger UI provides trace analysis

Trace Configuration

Each microservice is configured to send traces to Jaeger:

// Program.cs
services.AddOpenTelemetryTracing(builder =>
{
    builder
        .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(Configuration["ApplicationName"]))
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSource("MediatR")
        .AddJaegerExporter(options =>
        {
            options.AgentHost = Configuration["Jaeger:AgentHost"];
            options.AgentPort = int.Parse(Configuration["Jaeger:AgentPort"]);
        });
});

Viewing Traces in Jaeger

  1. Open Jaeger at http://localhost:16686
  2. Select a service from the dropdown
  3. Configure search parameters
  4. View and analyze traces

Service Mesh Monitoring

When using Istio service mesh:

Kiali for Service Mesh Visualization

  1. Open Kiali at http://localhost:20001
  2. View service mesh topology
  3. Analyze traffic flow between services
  4. Monitor service health

Istio Metrics

Istio provides additional metrics:

  • Request Volume: Requests per second by service
  • Success Rate: Percentage of successful requests
  • Latency: Response time by service
  • TCP Traffic: TCP metrics by service

Alerts and Notifications

Alert Configuration

We use Prometheus Alertmanager for alerts:

  1. Alert rules: Defined in Prometheus
  2. Alert processing: Handled by Alertmanager
  3. Notifications: Sent via configured channels (email, Slack, etc.)

Key Alerts

We have pre-configured alerts for:

  • Service Down: When a service is not responding
  • High Error Rate: When error rate exceeds threshold
  • High Latency: When response time exceeds threshold
  • Resource Saturation: When CPU/memory usage is high
  • Disk Space Low: When disk space is running out

Health Checks

Each microservice implements health checks:

  1. Liveness: Verifies service is running
  2. Readiness: Verifies service can handle requests
  3. Startup: Verifies service has started correctly

Kubernetes uses these health checks to manage container lifecycle.

Custom Monitoring

To add custom metrics:

  1. Add Prometheus metrics to your service:
// Define metrics
private static readonly Counter OrdersProcessed = Metrics.CreateCounter(
    "orders_processed_total", 
    "Number of processed orders");

// Use metrics
OrdersProcessed.Inc();
  1. Ensure your service exposes a metrics endpoint at /metrics
  2. Add scrape configuration to Prometheus
  3. Create dashboards in Grafana for your metrics
Clone this wiki locally