Skip to content

Conversation

@DaMandal0rian
Copy link
Contributor

@DaMandal0rian DaMandal0rian commented Jun 5, 2025

PR Type

Enhancement


Description

  • Enable compactor retention and delete settings

  • Set overall log retention to 30 days

  • Upgrade Grafana Loki, Promtail, and Mimir images


Changes walkthrough 📝

Relevant files
Configuration changes
loki-config.yml
Configure Loki compactor retention settings                           

logging/conf/loki-config.yml

  • Enabled compactor log retention
  • Added delete delay and worker count
  • Set limits_config retention period
  • Clarified retention comment in table_manager
  • +6/-1     
    Dependencies
    docker-compose.yml
    Upgrade Loki ecosystem images                                                       

    logging/grafana-loki/docker-compose.yml

  • Bumped Loki image to 2.9.14
  • Updated Promtail image to 2.9.14
  • Updated Mimir image to 2.9.14
  • +3/-3     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented Jun 5, 2025

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Retention Consistency

    Verify that limits_config.retention_period aligns with reject_old_samples_max_age to avoid unintended data rejection before retention.

    reject_old_samples_max_age: 168h
    retention_period: 720h  # Retain logs for 30 days
    
    Compactor Delete Workers

    Assess if setting retention_delete_worker_count to 150 is appropriate for resource usage and ensure it won’t overload the system.

    retention_delete_worker_count: 150
    delete_request_cancel_period: 5m
    Image Version Bump

    Confirm that upgrading Loki, Promtail, and Mimir to 2.9.14 is compatible with existing configurations and cluster components.

      image: grafana/loki:2.9.14
      ports:
        - "3100:3100"
      command: -config.file=/etc/loki/loki-config.yaml
      labels:
        - traefik.enable=true
        - traefik.http.services.loki.loadbalancer.server.port=3100
        - traefik.http.routers.loki.rule=Host(`logging.subspace.network`)
        - traefik.http.routers.loki.tls.certresolver=le
        - traefik.http.routers.loki.entrypoints=websecure
        - traefik.docker.network=traefik-proxy
        - traefik.http.routers.loki.middlewares=loki-auth
        - traefik.http.middlewares.loki-auth.basicauth.usersfile=/etc/traefik/loki-auth.txt
      volumes:
        - /root/loki/conf/loki/etc/loki-config.yaml:/etc/loki/loki-config.yaml:ro
      networks:
        - loki
        - traefik-proxy
    
    promtail:
      image: grafana/promtail:2.9.14
      command: -config.file=/etc/promtail/config.yml
      ports:
        - 127.0.0.1:3200:3100
      networks:
        - loki
    
    mimir:
      image: grafana/mimir:2.9.14
      volumes:

    @DaMandal0rian DaMandal0rian requested a review from vedhavyas June 5, 2025 11:38
    @github-actions
    Copy link

    github-actions bot commented Jun 5, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Move retention to table_manager

    The limits_config block does not support retention_period, so this setting will be
    ignored. Remove it from limits_config and place retention_period: 720h under the
    table_manager section where retention is actually enforced.

    logging/conf/loki-config.yml [50-53]

     limits_config:
       reject_old_samples: true
       reject_old_samples_max_age: 168h
    -  retention_period: 720h  # Retain logs for 30 days
    +  # retention_period moved to table_manager
    Suggestion importance[1-10]: 8

    __

    Why: The retention_period is not supported under limits_config and will be ignored, so moving it to the table_manager block ensures retention is actually enforced.

    Medium
    General
    Lower delete worker count

    A worker count of 150 can overwhelm resources during retention deletion. Lower this
    to a more conservative value (e.g., 50) or rely on the default to prevent spikes in
    CPU/memory usage.

    logging/conf/loki-config.yml [42-48]

     compactor:
       working_directory: /loki/boltdb-shipper-compactor
       shared_store: filesystem
       retention_enabled: true
       retention_delete_delay: 2h
    -  retention_delete_worker_count: 150
    +  retention_delete_worker_count: 50  # reduced to avoid resource exhaustion
       delete_request_cancel_period: 5m
    Suggestion importance[1-10]: 4

    __

    Why: Reducing retention_delete_worker_count helps prevent CPU/memory spikes but is a minor operational tuning rather than a bug fix.

    Low

    @DaMandal0rian DaMandal0rian merged commit 38abf96 into main Jun 5, 2025
    1 check passed
    @DaMandal0rian DaMandal0rian deleted the hotfix/loki branch June 5, 2025 12:48
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants