Skip to content

[service/telemetry] Add Configurable Log Rotation Support Using Lumberjack #13084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
261777d
[service/telemetry] Add log rotation support using lumberjack
srinivasvenkatabevara May 23, 2025
bfbb3ae
Merge branch 'open-telemetry:main' into lj-logrotate
bvsvas May 23, 2025
cb34be2
[service/telemetry] Handling lumberjack logger in shutdown
srinivasvenkatabevara May 23, 2025
f3e26bd
[service/telemetry] Handle filename
srinivasvenkatabevara May 23, 2025
e983202
[service/telemetry] Handle filename
srinivasvenkatabevara May 23, 2025
bc93b76
[service/telemetry] example logrotate configuration
srinivasvenkatabevara May 24, 2025
b37b7cf
Merge branch 'main' into lj-logrotate
bvsvas May 24, 2025
4bffa02
Merge branch 'main' into lj-logrotate
bvsvas May 26, 2025
5687a28
Merge branch 'main' into lj-logrotate
bvsvas May 26, 2025
ee9b2c2
Merge branch 'main' into lj-logrotate
bvsvas May 27, 2025
dc8d06d
Merge branch 'main' into lj-logrotate
bvsvas May 27, 2025
4b4f637
Merge branch 'main' into lj-logrotate
bvsvas May 27, 2025
1452e83
Merge branch 'main' into lj-logrotate
bvsvas May 28, 2025
dc3610f
[service/telemetry] Unit test
srinivasvenkatabevara May 29, 2025
c2aad65
[service/telemetry] Refactor logger creation for testability
srinivasvenkatabevara May 29, 2025
d34b6b2
[service/telemetry] Unit test
srinivasvenkatabevara May 29, 2025
293d985
[service/telemetry] Address PR review comments
srinivasvenkatabevara May 30, 2025
14f700f
[service/telemetry] Configure mdatagen to ignore lumberjack goroutine
srinivasvenkatabevara May 30, 2025
dbc3236
[service/telemetry] Fixing cspell build error
srinivasvenkatabevara Jun 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions .chloggen/logrotate-service-telemetry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. otlpreceiver)
component: service/telemetry

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Added support for log rotation using lumberjack in the opentelemetry collector telemetry logger.

# One or more tracking issues or pull requests related to the change
issues: [10768]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
The Collector's internal telemetry logger now supports optional log file rotation using the lumberjack library.
Configure `rotation` settings (max size, age, backups, compression) to manage log growth and retention directly, without needing external tools like logrotate.
Applies rotation only if a valid file-based `output_paths` entry is provided.
If `rotation.enabled` is false or the block is omitted, log behavior remains unchanged.
Only affects file-based logging; no impact on default console targets like stdout, stderr logging or console environments.

# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
1 change: 1 addition & 0 deletions cmd/otelcorecol/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ require (
google.golang.org/genproto/googleapis/rpc v0.0.0-20250519155744-55703ea1f237 // indirect
google.golang.org/grpc v1.72.2 // indirect
google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
Expand Down
2 changes: 2 additions & 0 deletions cmd/otelcorecol/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions internal/e2e/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ require (
google.golang.org/genproto/googleapis/rpc v0.0.0-20250519155744-55703ea1f237 // indirect
google.golang.org/grpc v1.72.2 // indirect
google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
Expand Down
2 changes: 2 additions & 0 deletions internal/e2e/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions otelcol/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ require (
google.golang.org/genproto/googleapis/api v0.0.0-20250218202821-56aae31c358a // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250218202821-56aae31c358a // indirect
google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

Expand Down
2 changes: 2 additions & 0 deletions otelcol/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions otelcol/otelcoltest/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ require (
google.golang.org/genproto/googleapis/rpc v0.0.0-20250218202821-56aae31c358a // indirect
google.golang.org/grpc v1.72.2 // indirect
google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)
Expand Down
2 changes: 2 additions & 0 deletions otelcol/otelcoltest/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion service/generated_package_test.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions service/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ require (
go.uber.org/multierr v1.11.0
go.uber.org/zap v1.27.0
gonum.org/v1/gonum v0.16.0
gopkg.in/natefinch/lumberjack.v2 v2.2.1
)

require (
Expand Down
2 changes: 2 additions & 0 deletions service/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions service/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,8 @@
sum:
value_type: int
monotonic: true

tests:
goleak:
ignore:
top: ["gopkg.in/natefinch/lumberjack%2ev2.(*Logger).millRun"]

Check warning on line 177 in service/metadata.yaml

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (natefinch)

Check warning on line 177 in service/metadata.yaml

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (gopkg)
4 changes: 4 additions & 0 deletions service/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,10 @@ func (srv *Service) Shutdown(ctx context.Context) error {

srv.telemetrySettings.Logger.Info("Shutdown complete.")

ljLogger := telemetry.GetRotatedLogger()
if ljLogger != nil {
ljLogger.Close()
}
errs = multierr.Append(errs, srv.shutdownTelemetry(ctx))

return errs
Expand Down
32 changes: 32 additions & 0 deletions service/service_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"errors"
"fmt"
"net/http"
"os"
"strings"
"sync"
"testing"
Expand Down Expand Up @@ -278,6 +279,37 @@ func TestServiceTelemetry(t *testing.T) {
}
}

func TestServiceShutdown_LumberjackLoggerClose(t *testing.T) {
tmpFile, err := os.CreateTemp(t.TempDir(), "test-lumberjack-shutdown-*.log")
require.NoError(t, err)
tmpFileName := tmpFile.Name()
require.NoError(t, tmpFile.Close())
t.Cleanup(func() {
assert.NoError(t, os.Remove(tmpFileName))
})

set := newNopSettings()
cfg := newNopConfig()

cfg.Telemetry.Logs.OutputPaths = []string{tmpFileName}
cfg.Telemetry.Logs.Rotation = &telemetry.LogsRotationConfig{
Enabled: true,
MaxMegabytes: 1,
MaxBackups: 2,
}

srv, err := New(context.Background(), set, cfg)
require.NoError(t, err)
require.NotNil(t, srv)

require.NoError(t, srv.Start(context.Background()))

// Call Shutdown. The purpose of this test is to ensure that if a lumberjack logger
// was initialized (due to file output path), its Close method is called during shutdown.
err = srv.Shutdown(context.Background())
assert.NoError(t, err)
}

func testCollectorStartHelperWithReaders(t *testing.T, tc ownMetricsTestCase, network string) {
var once sync.Once
loggingHookCalled := false
Expand Down
3 changes: 3 additions & 0 deletions service/telemetry/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ type LogsConfig = migration.LogsConfigV030
// to preserve a representative subset of your logs.
type LogsSamplingConfig = migration.LogsSamplingConfig

// LogsRotationConfig sets a rotation strategy for the logger.
type LogsRotationConfig = migration.LogsRotationConfig

Comment on lines +52 to +54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not familiar with the migration strategy here, I will look to an expert. Is it appropriate to be introducing new structs in the v0.3.0.go file (I would've expected that to be auto-generated / I'm no expert.).

// MetricsConfig exposes the common Telemetry configuration for one component.
// Experimental: *NOTE* this structure is subject to change or removal in the future.
type MetricsConfig = migration.MetricsConfigV030
Expand Down
28 changes: 28 additions & 0 deletions service/telemetry/internal/migration/v0.3.0.go
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,17 @@ type LogsConfigV030 struct {
// Sampling can be disabled by setting 'enabled' to false
Sampling *LogsSamplingConfig `mapstructure:"sampling"`

// Rotation is the configuration for log rotation. If this is not specified,
// log rotation will be disabled.
// Example:
// rotation:
// max_megabytes: 100 # Rotate the log file when it exceeds 100 MB.
// max_backups: 5 # Keep at most 5 old log files.
// max_age: 14 # Keep old log files for at most 14 days.
// compress: true # Compress rotated log files.
// See LogsRotationConfig for more details on the available options.
Rotation *LogsRotationConfig `mapstructure:"rotation,omitempty"`

// OutputPaths is a list of URLs or file paths to write logging output to.
// The URLs could only be with "file" schema or without schema.
// The URLs with "file" schema must be an absolute path.
Expand Down Expand Up @@ -184,6 +195,23 @@ type LogsSamplingConfig struct {
Thereafter int `mapstructure:"thereafter"`
}

// LogsRotationConfig defines the configuration for log rotation.
// It allows users to manage log file sizes and retention policies.
type LogsRotationConfig struct {
// Enabled activates log file rotation. When set to true, log files will be rotated based on other configuration parameters.
Enabled bool `mapstructure:"enabled"`
// MaxMegabytes is the maximum size in megabytes that a log file can reach before it is rotated.
// For example, if set to 100, a new log file will be created when the current one exceeds 100MB.
MaxMegabytes int `mapstructure:"max_megabytes"`
// MaxBackups is the maximum number of old (rotated) log files to retain.
// If set to 3, the system will keep the current log file and the 3 most recent rotated files.
MaxBackups int `mapstructure:"max_backups"`
// MaxAge is the maximum number of days to retain old log files. The age is based on the timestamp encoded in the rotated filenames.
MaxAge int `mapstructure:"max_age"`
// Compress determines if the rotated log files should be compressed using gzip.
Compress bool `mapstructure:"compress"`
}

func (c *LogsConfigV030) Unmarshal(conf *confmap.Conf) error {
unmarshaled := *c
if err := conf.Unmarshal(c); err != nil {
Expand Down
97 changes: 96 additions & 1 deletion service/telemetry/logger.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,104 @@
package telemetry // import "go.opentelemetry.io/collector/service/telemetry"

import (
"net/url"

"github.com/google/uuid"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/log"
"go.uber.org/zap"
"go.uber.org/zap/zapcore"
lumberjack "gopkg.in/natefinch/lumberjack.v2"

"go.opentelemetry.io/collector/internal/telemetry/componentattribute"
)

var (
rotationSchema string
ljLogger *lumberjack.Logger
)

type logRotateSink struct {
*lumberjack.Logger
}

// Sync is a no-op method to satisfy the zap.Sink interface, ensuring compatibility with zap's logging framework.
func (lr logRotateSink) Sync() error {
// no-op
return nil
}

// GetRotatedLogger returns the global lumberjack logger instance.
func GetRotatedLogger() *lumberjack.Logger {
return ljLogger
}

func createLumberjackLogger(logsCfg LogsConfig, logFileName string) *lumberjack.Logger {
return &lumberjack.Logger{
Filename: logFileName,
MaxSize: logsCfg.Rotation.MaxMegabytes,
MaxAge: logsCfg.Rotation.MaxAge,
MaxBackups: logsCfg.Rotation.MaxBackups,
Compress: logsCfg.Rotation.Compress,
}
}

// getFirstFileOutputPath iterates through the output paths and returns the first
// path that is not a reserved keyword (console, stdout, stderr).
// It returns an empty string if no suitable file path is found.
func getFirstFileOutputPath(logsCfg LogsConfig) (string, int) {
if len(logsCfg.OutputPaths) == 0 {
return "", -1
}

for fIdx, path := range logsCfg.OutputPaths {
switch path {
case "console", "stdout", "stderr":
// Ignore these keywords
continue
default:
// This is considered a file path
return path, fIdx
}
}
// No file path found
return "", -1
}

func registerLumberjackSink(logger *lumberjack.Logger, rotationSchemaLocal string) error {
err := zap.RegisterSink(rotationSchemaLocal, func(*url.URL) (zap.Sink, error) {
return logRotateSink{Logger: logger}, nil
})
return err
}

// newLogger creates a Logger and a LoggerProvider from Config.
// It generates a unique rotation schema if log rotation is enabled, then calls makeLogger.
// This separation facilitates testing makeLogger's lumberjack sink
// registration error path by allowing a predictable schema in tests.
func newLogger(set Settings, cfg Config) (*zap.Logger, log.LoggerProvider, error) {
rotationSchema = ""
if cfg.Logs.Rotation != nil && cfg.Logs.Rotation.Enabled {
rotationSchema = "lumberjack-" + uuid.NewString()
}

return makeLogger(set, cfg, rotationSchema)
}

// makeLogger creates a Logger and a LoggerProvider from Config and a custom rotation schema.
func makeLogger(set Settings, cfg Config, rotationSchema string) (*zap.Logger, log.LoggerProvider, error) {
logFileName, logFileIndex := getFirstFileOutputPath(cfg.Logs)

if cfg.Logs.Rotation != nil && cfg.Logs.Rotation.Enabled && len(logFileName) > 0 {
ljLogger = createLumberjackLogger(cfg.Logs, logFileName)

err := registerLumberjackSink(ljLogger, rotationSchema)
if err != nil {
return nil, nil, err
}
cfg.Logs.OutputPaths[logFileIndex] = rotationSchema + ":" + logFileName
}

// Copied from NewProductionConfig.
ec := zap.NewProductionEncoderConfig()
ec.EncodeTime = zapcore.ISO8601TimeEncoder
Expand All @@ -39,6 +127,13 @@ func newLogger(set Settings, cfg Config) (*zap.Logger, log.LoggerProvider, error
return nil, nil, err
}

logger, lp := configureLogger(logger, cfg, set)

return logger, lp, nil
}

// configureLogger applies common configuration to the logger
func configureLogger(logger *zap.Logger, cfg Config, set Settings) (*zap.Logger, log.LoggerProvider) {
// The attributes in cfg.Resource are added as resource attributes for logs exported through the LoggerProvider instantiated below.
// To make sure they are also exposed in logs written to stdout, we add them as fields to the Zap core created above using WrapCore.
// We do NOT add them to the logger using With, because that would apply to all logs, even ones exported through the core that wraps
Expand Down Expand Up @@ -87,7 +182,7 @@ func newLogger(set Settings, cfg Config) (*zap.Logger, log.LoggerProvider, error
return core
}))

return logger, lp, nil
return logger, lp
}

func newSampledCore(core zapcore.Core, sc *LogsSamplingConfig) zapcore.Core {
Expand Down
Loading
Loading