-
Notifications
You must be signed in to change notification settings - Fork 72
feat: store machine logs in sqlite #1933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: store machine logs in sqlite #1933
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors machine log storage to use SQLite as the primary storage backend instead of in-memory circular buffers with optional file persistence. The change introduces a common LogStore interface with two implementations: SQLite-based and circular buffer-based, allowing users to choose their preferred storage method.
Key Changes:
- Introduced a new
logstorepackage with a common interface for log storage implementations - Added SQLite-based log store implementation with support for tailing and following logs
- Refactored the existing circular buffer implementation to implement the new
LogStoreinterface - Changed default configuration to enable SQLite storage and disable circular buffer file storage
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/pkg/siderolink/logstore/logstore.go | Defines common LogStore and LineReader interfaces for log storage backends |
| internal/pkg/siderolink/logstore/sqlitelog/sqlitelog.go | SQLite-based log store implementation with write, read, and follow functionality |
| internal/pkg/siderolink/logstore/sqlitelog/manager.go | SQLite store manager for creating, checking existence, and removing machine log stores |
| internal/pkg/siderolink/logstore/sqlitelog/sqlitelog_test.go | Tests for SQLite log store read/write and follow operations |
| internal/pkg/siderolink/logstore/circularlog/circularlog.go | Refactored circular buffer implementation to implement LogStore interface |
| internal/pkg/siderolink/logstore/circularlog/manager.go | Circular buffer store manager implementing common management interface |
| internal/pkg/siderolink/machines.go | Major refactoring to use LogStore abstraction instead of direct circular buffer access |
| internal/pkg/siderolink/loghandler.go | Updated to use LogStore interface and pass context to all log operations |
| internal/pkg/siderolink/loghandler_test.go | Updated tests to use new API with context parameter and database setup |
| internal/pkg/logreceiver/logreceiver.go | Updated to pass context through connection handling |
| internal/pkg/logreceiver/logreceiver_test.go | Updated test to provide context parameter |
| internal/pkg/config/logs.go | Added LogsMachineSQLite configuration struct and documented buffer config usage |
| internal/pkg/config/config.go | Changed defaults to enable SQLite storage and disable circular buffer file storage |
| internal/backend/grpc/support.go | Updated support bundle collection to use new log store API with context |
| internal/backend/grpc/management.go | Updated machine logs streaming to use new reader API with context |
| internal/backend/runtime/omni/state_sqlite.go | Refactored to accept pre-opened database connection instead of opening internally |
| internal/backend/runtime/omni/state.go | Updated to pass secondary storage database connection |
| cmd/omni/pkg/app/app.go | Updated to pass secondary storage database to log handler |
| cmd/omni/cmd/cmd.go | Opens secondary storage database at startup and passes it through the stack |
| internal/integration/integration_test.go | Updated integration tests to create and pass test database |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
45e657c to
c497ee1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 29 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Use SQLite storage to store machine logs. Use the same SQLite database used for the metrics state. Refactor machine log storage to define a common interface for circular and sqlite based logs. Signed-off-by: Utku Ozdemir <[email protected]>
c497ee1 to
a1b543b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ctx, cancel := context.WithTimeout(ctx, s.config.Timeout) | ||
| defer cancel() | ||
|
|
||
| result, err := s.db.ExecContext(ctx, query, s.id, message, time.Now().Unix()) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WriteLine method creates a new context with timeout that shadows the incoming context, but holds a mutex during the database operation. If the database operation blocks for longer than the timeout, the mutex will remain locked while the context times out. This could lead to deadlocks or mutex contention. Consider releasing the mutex before the database operation or using the original context instead of creating a new one with timeout.
| m.logger.Debug("failed to get number of rows affected during logs cleanup", zap.Error(err)) | ||
| } | ||
|
|
||
| m.logger.Info("completed logs cleanup", zap.Int64("rows_affected", numRowsDeleted)) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The log level for completed cleanup is Info, which might be too verbose for a periodic operation that runs every 30 minutes by default. Consider using Debug level instead, or only log at Info level when numRowsDeleted > 0.
| m.logger.Info("completed logs cleanup", zap.Int64("rows_affected", numRowsDeleted)) | |
| if numRowsDeleted > 0 { | |
| m.logger.Info("completed logs cleanup", zap.Int64("rows_affected", numRowsDeleted)) | |
| } else { | |
| m.logger.Debug("completed logs cleanup", zap.Int64("rows_affected", numRowsDeleted)) | |
| } |
| panichandler.Go(func() { | ||
| select { | ||
| case <-closeCh: // normal close, reader is already unsubscribed | ||
| return | ||
| case <-ctx.Done(): | ||
| s.unsubscribe(followCh) | ||
| } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential race condition in the subscription cleanup goroutine. If the reader is closed before the context is done, the goroutine will wait on closeCh forever since it only listens for either closeCh or ctx.Done(). The goroutine should also check if the reader was already closed by selecting on the followCh being closed, or the goroutine should be properly terminated when Close() is called. Currently, this could leak goroutines if readers are closed without the context being canceled.
| // ctx is the root context for all connections. | ||
| ctx context.Context //nolint:containedctx |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing a context in a struct field (ctx context.Context) is generally discouraged in Go. While there's a nolint directive acknowledging this (//nolint:containedctx), the comment doesn't explain why it's necessary here. Consider documenting why this pattern is required in this specific case, or refactoring to pass the context through method calls instead.
| Enabled: true, | ||
| Timeout: 10 * time.Second, | ||
| CleanupInterval: 30 * time.Minute, | ||
| CleanupOlderThan: 24 * 30 * time.Hour, |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The cleanup threshold is set to 24 * 30 * time.Hour which assumes 30 days always have 24 hours. This could be more clearly expressed as 30 * 24 * time.Hour (720 hours) or better yet, use a constant like 720 * time.Hour with a comment explaining it represents 30 days. The current calculation order is correct mathematically but less intuitive.
| CleanupOlderThan: 24 * 30 * time.Hour, | |
| // 30 days | |
| CleanupOlderThan: 30 * 24 * time.Hour, |
Use SQLite storage to store machine logs.
Use the same SQLite database used for the metrics state.
Refactor machine log storage to define a common interface for circular and sqlite based logs.
Part of #1771