-
Notifications
You must be signed in to change notification settings - Fork 2
'Development': stream docker container logs to NATS subject #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
* Refactor job queue handling to support priority-based processing and update related tests * Enhance job stream configuration to disallow duplicates * Add KeyValue store for job management * Add Bruno files for NATS
Switch from Redis to NATS (#2)
…#219) * update bruno script to show failure if it fails * add shared volumes to fix "folder not found error" * ensure debug mode logs deletion of volume * implement delete volume after creation before execution flag
""" WalkthroughThe changes refactor Docker scheduling by removing direct file-based log handling and volume management, introducing structured log parsing and NATS-based log publishing. The default Docker container autoremove behavior is changed to retain containers post-execution. The scheduler now integrates NATS messaging for logs, and dependency versions are updated. Minor formatting improvements are included. Changes
Sequence Diagram(s)sequenceDiagram
participant Main
participant Scheduler
participant DockerJob
participant DockerStep
participant Container
participant LogParser
participant NATS
Main->>Scheduler: SetNatsConnection(nc)
Main->>Scheduler: ScheduleJob(job)
Scheduler->>DockerJob: execute(ctx)
DockerJob->>DockerStep: execute(ctx)
DockerStep->>Container: Run container
Container-->>DockerStep: Container completes
DockerStep->>DockerStep: getContainerLogs()
DockerStep->>LogParser: ParseContainerLogs(stdout, stderr, containerID)
LogParser-->>DockerStep: Structured Log
DockerStep->>NATS: PublishLogs(log)
DockerStep->>DockerStep: removeContainer() if not auto-remove
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
…m/hades into feat/log-extraction-nats
Can you have a look at the conflicts please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
HadesScheduler/log/publisher.go (1)
12-27
: Well-implemented log publishing function with proper error handling.The
PublishLogsToNATS
function correctly handles JSON marshaling and NATS publishing with comprehensive error handling. The use of structured logging with job context and error wrapping follows best practices.Consider adding a debug log for successful publishing to aid in troubleshooting:
if err := nc.Publish(subject, data); err != nil { slog.Error("Failed to publish log to NATS", slog.String("job_id", buildJobLog.JobID), slog.Any("error", err)) return fmt.Errorf("publishing log to NATS: %w", err) } + slog.Debug("Published log to NATS", slog.String("job_id", buildJobLog.JobID), slog.String("subject", subject)) return nil
HadesScheduler/log/parser.go (1)
73-93
: Timestamp parsing logic handles the common case well.The implementation correctly parses Docker's timestamp format and preserves the full message content. The fallback to current time is a reasonable default.
Consider adding a debug log when timestamp parsing fails to help identify logs with unexpected formats during development.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
HadesScheduler/go.sum
is excluded by!**/*.sum
go.work.sum
is excluded by!**/*.sum
📒 Files selected for processing (8)
.env.example
(1 hunks)HadesScheduler/docker/container.go
(1 hunks)HadesScheduler/docker/docker.go
(9 hunks)HadesScheduler/go.mod
(2 hunks)HadesScheduler/log/parser.go
(1 hunks)HadesScheduler/log/publisher.go
(1 hunks)HadesScheduler/main.go
(1 hunks)shared/utils/queue.go
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
HadesScheduler/main.go (1)
HadesScheduler/docker/docker.go (1)
NewDockerScheduler
(63-83)
HadesScheduler/log/publisher.go (1)
HadesScheduler/log/parser.go (2)
Log
(24-28)LogSubjectFormat
(15-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build (./HadesScheduler/Dockerfile, ls1intum/hades/hades-scheduler) / Build Docker Image for ls1intum/hades/hades-scheduler
- GitHub Check: build (./HadesScheduler/Dockerfile, ls1intum/hades/hades-scheduler) / Build Docker Image for ls1intum/hades/hades-scheduler
- GitHub Check: build (./HadesCloneContainer/Dockerfile, ls1intum/hades/hades-clone-container) / Build Docker Image for ls1intum/hades/hades-clone-container
- GitHub Check: build (./HadesAPI/Dockerfile, ls1intum/hades/hades-api) / Build Docker Image for ls1intum/hades/hades-api
- GitHub Check: build (./HadesAPI/Dockerfile, ls1intum/hades/hades-api) / Build Docker Image for ls1intum/hades/hades-api
- GitHub Check: build (./HadesCloneContainer/Dockerfile, ls1intum/hades/hades-clone-container) / Build Docker Image for ls1intum/hades/hades-clone-container
- GitHub Check: Analyze (go)
🔇 Additional comments (12)
.env.example (1)
25-25
: Configuration change aligns with new log processing workflow.The change from
true
tofalse
forDOCKER_CONTAINER_AUTOREMOVE
makes sense given that containers now need to remain available for log extraction before cleanup. The updated comment clearly explains the purpose.Verify that the new container retention behavior doesn't negatively impact system resource usage or cleanup processes in production environments.
HadesScheduler/go.mod (1)
45-45
: Security review passed for updated Go modulesThe updated dependencies have been checked against major Go security advisories and no known vulnerabilities were identified for the specified versions:
• github.com/google/go-cmp v0.7.0 – no CVEs or advisories found
• golang.org/x/time v0.12.0 – no CVEs or advisories found
• google.golang.org/protobuf v1.36.6 – no new advisories (all prior issues fixed in v1.33.0+)These changes can be safely approved.
shared/utils/queue.go (1)
242-242
: Formatting improvements enhance code readability.The added blank lines before error handling blocks improve visual separation between error logging and message NACKing, making the code more readable without affecting functionality.
Also applies to: 252-252, 262-262
HadesScheduler/main.go (1)
68-68
: Approved: NATS integration in Docker scheduler is correctly implementedThe
SetNatsConnection
method exists inHadesScheduler/docker/docker.go
(lines 95–102), properly assigns thenats.Conn
, logs a warning if it’s nil, and returns the scheduler for method chaining. No further changes needed.• Confirmed implementation location: HadesScheduler/docker/docker.go:95–102
• Integration aligns with PR objectives for log publishing via NATSHadesScheduler/docker/container.go (3)
19-31
: LGTM! Clean implementation of log processing pipeline.The function properly handles errors with contextual wrapping and follows a clear flow from log retrieval to publishing.
33-51
: Well-implemented log retrieval with proper demultiplexing.Good use of
stdcopy.StdCopy
for handling Docker's multiplexed log format and proper resource cleanup withdefer
.
53-63
: Robust container cleanup implementation.Good use of force removal and volume cleanup options. The success logging is helpful for debugging container lifecycle.
HadesScheduler/log/parser.go (2)
12-28
: Well-structured data models for log representation.Good design with clear constants and properly tagged structs for JSON serialization. The subject format using job ID enables proper message routing.
30-56
: Clean and efficient log parsing implementation.Excellent use of anonymous struct for stream processing and comprehensive debug logging. The error wrapping provides good context for troubleshooting.
HadesScheduler/docker/docker.go (3)
24-24
: Important default behavior change for container lifecycle.Changing
ContainerAutoremove
tofalse
is correct for the new log extraction workflow. This ensures containers remain available for log processing before explicit cleanup.
95-102
: Good defensive programming with nil check.The warning for nil NATS connection helps identify configuration issues early. Method chaining is consistent with the existing pattern.
286-299
: Excellent error handling strategy for log processing and cleanup.The implementation correctly:
- Fails the step if log processing fails (logs are critical)
- Always attempts cleanup to prevent resource leaks
- Logs but doesn't fail on cleanup errors (work is already done)
This ensures both reliability and proper resource management.
Can you incorporate the CodeRabbit suggestions or just close them if they are nonsense? |
nice, I will address the live log streaming in another PR. this PR is getting big hahaha |
Looking good now :) Please remove the remaining fluentbit parts in a follow up PR :) We dont need that anymore |
@robertjndw Can you also have another look? |
✨ What is the change?
I added a "log extraction logic" to send docker container logs to a NATS subject (derived from each job id) for each build job are sent after each build step.
📌 Link to issue
Closes #255, #256
🧪 Steps for Testing
docker compose up nats
DEBUG=true go run .
in both HadesScheduler and HadesAPInats sub "logs.*" --server="nats://localhost:4222"
(logs.* meaning to sub to all subject from the server)✅ PR Checklist
Summary by CodeRabbit