Skip to content

'Development': stream docker container logs to NATS subject #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

paoxin
Copy link
Contributor

@paoxin paoxin commented Jun 18, 2025

✨ What is the change?

I added a "log extraction logic" to send docker container logs to a NATS subject (derived from each job id) for each build job are sent after each build step.

📌 Link to issue

Closes #255, #256

🧪 Steps for Testing

  1. run docker compose up nats
  2. run DEBUG=true go run . in both HadesScheduler and HadesAPI
  3. run bruno get streams
  4. run nats sub "logs.*" --server="nats://localhost:4222" (logs.* meaning to sub to all subject from the server)
  5. run bruno create build job succeed
  6. you should be able to see the logs in the terminal on step 4

✅ PR Checklist

  • I have tested these changes locally or on the dev environment
  • Code is clean, readable, and documented
  • Tests added or updated (if needed)
  • Documentation updated (if relevant)

Summary by CodeRabbit

  • New Features
    • Container logs are parsed into structured entries and published to a NATS messaging system for improved log handling and real-time access.
    • Added explicit container cleanup after log processing.
    • Integrated NATS connection setup in the scheduler for log publishing.
    • Introduced a Docker-based job scheduler with enhanced resource management and logging capabilities.
  • Bug Fixes
    • Enhanced error handling and logging for container log processing and cleanup operations.
  • Chores
    • Updated default environment variable to retain Docker containers after execution, preserving logs by default.
    • Upgraded dependencies for improved stability and compatibility.

robertjndw and others added 10 commits May 20, 2025 11:07
* Refactor job queue handling to support priority-based processing and update related tests

* Enhance job stream configuration to disallow duplicates

* Add KeyValue store for job management

* Add Bruno files for NATS
Switch from Redis to NATS  (#2)
…#219)

* update bruno script to show failure if it fails

* add shared volumes to fix "folder not found error"

* ensure debug mode logs deletion of volume

* implement delete volume after creation before execution flag
@paoxin paoxin requested review from Mtze and robertjndw June 18, 2025 11:44
Copy link
Contributor

coderabbitai bot commented Jun 18, 2025

"""

Walkthrough

The changes refactor Docker scheduling by removing direct file-based log handling and volume management, introducing structured log parsing and NATS-based log publishing. The default Docker container autoremove behavior is changed to retain containers post-execution. The scheduler now integrates NATS messaging for logs, and dependency versions are updated. Minor formatting improvements are included.

Changes

File(s) Change Summary
.env.example Changed default DOCKER_CONTAINER_AUTOREMOVE from true to false and updated the related comment.
HadesScheduler/docker/container.go Removed file containing functions for writing container logs to file and copying scripts into containers.
HadesScheduler/docker/docker.go Removed job scheduling structs and methods; added standalone Docker utility functions for image pulling, volume management, log retrieval, copying files, and container removal with NATS log publishing.
HadesScheduler/docker/image.go Removed file containing concurrent Docker image pulling function.
HadesScheduler/docker/volume.go Removed file containing Docker volume creation and deletion functions.
HadesScheduler/docker/job.go Added DockerJob struct and method to execute Docker job steps with logging and publisher integration.
HadesScheduler/docker/scheduler.go Added Docker scheduler implementation with environment config, Docker client setup, job scheduling, Fluentd logging options, and NATS connection support.
HadesScheduler/docker/step.go Added DockerStep struct and method to run Docker containers for steps, handle resource limits, logs, and NATS publishing.
HadesScheduler/go.mod Removed dependency on golang.org/x/exp; added indirect dependency on golang.org/x/tools; upgraded golang.org/x/time and google.golang.org/protobuf.
HadesScheduler/log/parser.go Added new package to parse raw container stdout/stderr logs into structured log entries with timestamps and output stream identifiers.
HadesScheduler/log/publisher.go Added Publisher interface and NATS-based implementation to publish structured logs, with error handling and logging.
HadesScheduler/main.go Modified Docker scheduler initialization to set NATS connection instead of volume cleanup configuration.
shared/utils/queue.go Added blank lines before error handling blocks calling msg.Nak() for improved readability without changing logic.

Sequence Diagram(s)

sequenceDiagram
    participant Main
    participant Scheduler
    participant DockerJob
    participant DockerStep
    participant Container
    participant LogParser
    participant NATS

    Main->>Scheduler: SetNatsConnection(nc)
    Main->>Scheduler: ScheduleJob(job)
    Scheduler->>DockerJob: execute(ctx)
    DockerJob->>DockerStep: execute(ctx)
    DockerStep->>Container: Run container
    Container-->>DockerStep: Container completes
    DockerStep->>DockerStep: getContainerLogs()
    DockerStep->>LogParser: ParseContainerLogs(stdout, stderr, containerID)
    LogParser-->>DockerStep: Structured Log
    DockerStep->>NATS: PublishLogs(log)
    DockerStep->>DockerStep: removeContainer() if not auto-remove
Loading

Suggested labels

enhancement

Suggested reviewers

  • Mtze

Poem

In the warren, logs now hop and leap,
From Docker’s burrow, structured and neat.
No more lost in files or fleeting air—
They ride the NATS stream, everywhere!
Containers linger, logs preserved with care,
As bunnies code on, with time to spare.
🐇✨
"""


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cc14aa and f2c75a3.

📒 Files selected for processing (1)
  • HadesScheduler/docker/step.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • HadesScheduler/docker/step.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (./HadesScheduler/Dockerfile, ls1intum/hades/hades-scheduler) / Build Docker Image for ls1intum/hades/hades-scheduler
  • GitHub Check: build (./HadesAPI/Dockerfile, ls1intum/hades/hades-api) / Build Docker Image for ls1intum/hades/hades-api
  • GitHub Check: build (./HadesScheduler/Dockerfile, ls1intum/hades/hades-scheduler) / Build Docker Image for ls1intum/hades/hades-scheduler
  • GitHub Check: build (./HadesAPI/Dockerfile, ls1intum/hades/hades-api) / Build Docker Image for ls1intum/hades/hades-api
  • GitHub Check: Analyze (go)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@paoxin paoxin self-assigned this Jul 2, 2025
@paoxin paoxin changed the title 'Development': stream container logs to NATS subject 'Development': stream docker container logs to NATS subject Jul 9, 2025
@paoxin paoxin marked this pull request as ready for review July 9, 2025 11:55
@paoxin paoxin marked this pull request as draft July 9, 2025 12:01
@paoxin paoxin marked this pull request as ready for review July 9, 2025 12:07
Base automatically changed from nats-develop to main July 10, 2025 15:24
@Mtze
Copy link
Member

Mtze commented Jul 10, 2025

Can you have a look at the conflicts please?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
HadesScheduler/log/publisher.go (1)

12-27: Well-implemented log publishing function with proper error handling.

The PublishLogsToNATS function correctly handles JSON marshaling and NATS publishing with comprehensive error handling. The use of structured logging with job context and error wrapping follows best practices.

Consider adding a debug log for successful publishing to aid in troubleshooting:

	if err := nc.Publish(subject, data); err != nil {
		slog.Error("Failed to publish log to NATS", slog.String("job_id", buildJobLog.JobID), slog.Any("error", err))
		return fmt.Errorf("publishing log to NATS: %w", err)
	}

+	slog.Debug("Published log to NATS", slog.String("job_id", buildJobLog.JobID), slog.String("subject", subject))
	return nil
HadesScheduler/log/parser.go (1)

73-93: Timestamp parsing logic handles the common case well.

The implementation correctly parses Docker's timestamp format and preserves the full message content. The fallback to current time is a reasonable default.

Consider adding a debug log when timestamp parsing fails to help identify logs with unexpected formats during development.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1aa43c6 and 1870e51.

⛔ Files ignored due to path filters (2)
  • HadesScheduler/go.sum is excluded by !**/*.sum
  • go.work.sum is excluded by !**/*.sum
📒 Files selected for processing (8)
  • .env.example (1 hunks)
  • HadesScheduler/docker/container.go (1 hunks)
  • HadesScheduler/docker/docker.go (9 hunks)
  • HadesScheduler/go.mod (2 hunks)
  • HadesScheduler/log/parser.go (1 hunks)
  • HadesScheduler/log/publisher.go (1 hunks)
  • HadesScheduler/main.go (1 hunks)
  • shared/utils/queue.go (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
HadesScheduler/main.go (1)
HadesScheduler/docker/docker.go (1)
  • NewDockerScheduler (63-83)
HadesScheduler/log/publisher.go (1)
HadesScheduler/log/parser.go (2)
  • Log (24-28)
  • LogSubjectFormat (15-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build (./HadesScheduler/Dockerfile, ls1intum/hades/hades-scheduler) / Build Docker Image for ls1intum/hades/hades-scheduler
  • GitHub Check: build (./HadesScheduler/Dockerfile, ls1intum/hades/hades-scheduler) / Build Docker Image for ls1intum/hades/hades-scheduler
  • GitHub Check: build (./HadesCloneContainer/Dockerfile, ls1intum/hades/hades-clone-container) / Build Docker Image for ls1intum/hades/hades-clone-container
  • GitHub Check: build (./HadesAPI/Dockerfile, ls1intum/hades/hades-api) / Build Docker Image for ls1intum/hades/hades-api
  • GitHub Check: build (./HadesAPI/Dockerfile, ls1intum/hades/hades-api) / Build Docker Image for ls1intum/hades/hades-api
  • GitHub Check: build (./HadesCloneContainer/Dockerfile, ls1intum/hades/hades-clone-container) / Build Docker Image for ls1intum/hades/hades-clone-container
  • GitHub Check: Analyze (go)
🔇 Additional comments (12)
.env.example (1)

25-25: Configuration change aligns with new log processing workflow.

The change from true to false for DOCKER_CONTAINER_AUTOREMOVE makes sense given that containers now need to remain available for log extraction before cleanup. The updated comment clearly explains the purpose.

Verify that the new container retention behavior doesn't negatively impact system resource usage or cleanup processes in production environments.

HadesScheduler/go.mod (1)

45-45: Security review passed for updated Go modules

The updated dependencies have been checked against major Go security advisories and no known vulnerabilities were identified for the specified versions:

• github.com/google/go-cmp v0.7.0 – no CVEs or advisories found
• golang.org/x/time v0.12.0 – no CVEs or advisories found
• google.golang.org/protobuf v1.36.6 – no new advisories (all prior issues fixed in v1.33.0+)

These changes can be safely approved.

shared/utils/queue.go (1)

242-242: Formatting improvements enhance code readability.

The added blank lines before error handling blocks improve visual separation between error logging and message NACKing, making the code more readable without affecting functionality.

Also applies to: 252-252, 262-262

HadesScheduler/main.go (1)

68-68: Approved: NATS integration in Docker scheduler is correctly implemented

The SetNatsConnection method exists in HadesScheduler/docker/docker.go (lines 95–102), properly assigns the nats.Conn, logs a warning if it’s nil, and returns the scheduler for method chaining. No further changes needed.

• Confirmed implementation location: HadesScheduler/docker/docker.go:95–102
• Integration aligns with PR objectives for log publishing via NATS

HadesScheduler/docker/container.go (3)

19-31: LGTM! Clean implementation of log processing pipeline.

The function properly handles errors with contextual wrapping and follows a clear flow from log retrieval to publishing.


33-51: Well-implemented log retrieval with proper demultiplexing.

Good use of stdcopy.StdCopy for handling Docker's multiplexed log format and proper resource cleanup with defer.


53-63: Robust container cleanup implementation.

Good use of force removal and volume cleanup options. The success logging is helpful for debugging container lifecycle.

HadesScheduler/log/parser.go (2)

12-28: Well-structured data models for log representation.

Good design with clear constants and properly tagged structs for JSON serialization. The subject format using job ID enables proper message routing.


30-56: Clean and efficient log parsing implementation.

Excellent use of anonymous struct for stream processing and comprehensive debug logging. The error wrapping provides good context for troubleshooting.

HadesScheduler/docker/docker.go (3)

24-24: Important default behavior change for container lifecycle.

Changing ContainerAutoremove to false is correct for the new log extraction workflow. This ensures containers remain available for log processing before explicit cleanup.


95-102: Good defensive programming with nil check.

The warning for nil NATS connection helps identify configuration issues early. Method chaining is consistent with the existing pattern.


286-299: Excellent error handling strategy for log processing and cleanup.

The implementation correctly:

  • Fails the step if log processing fails (logs are critical)
  • Always attempts cleanup to prevent resource leaks
  • Logs but doesn't fail on cleanup errors (work is already done)

This ensures both reliability and proper resource management.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@robertjndw
Copy link
Member

Can you incorporate the CodeRabbit suggestions or just close them if they are nonsense?
Otherwise, the code looks good to me 👍🏼

@paoxin
Copy link
Contributor Author

paoxin commented Jul 18, 2025

nice, I will address the live log streaming in another PR. this PR is getting big hahaha

@paoxin paoxin requested review from Mtze, robertjndw and ShuaiweiYu July 18, 2025 14:57
coderabbitai[bot]

This comment was marked as resolved.

@Mtze
Copy link
Member

Mtze commented Jul 22, 2025

Looking good now :)

Please remove the remaining fluentbit parts in a follow up PR :) We dont need that anymore

@Mtze
Copy link
Member

Mtze commented Jul 22, 2025

@robertjndw Can you also have another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

build job logs extraction from docker container refactor HadesScheduler docker environment file structure
3 participants