Skip to content

Commit

Permalink
[exporter/elasticsearch] add support for batcher config (#34238)
Browse files Browse the repository at this point in the history
**Description:**

Add opt-in support for the experimental batch sender
(open-telemetry/opentelemetry-collector#8122).
When opting into this functionality the exporter's `Consume*` methods
will make synchronous bulk requests to Elasticsearch, without additional
batching/buffering in the exporter.

By default the exporter continues to use its own buffering, which
supports thresholds for time, number of documents, and size of encoded
documents in bytes. The batch sender does not currently support a
bytes-based threshold, and is experimental, hence why we are not yet
making it the default for the Elasticsearch exporter.

This PR is based on
#32632,
but made to be non-breaking.

**Link to tracking Issue:**

#32377

**Testing:**

Added unit and integration tests.

Manually tested with Elasticsearch, using the following config to
highlight that client metadata can now flow through all the way:

```yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
        include_metadata: true

exporters:
  elasticsearch:
    endpoint: "http://localhost:9200"
    auth:
      authenticator: headers_setter
    batcher:
      enabled: false

extensions:
  headers_setter:
    headers:
      - action: insert
        key: Authorization
        from_context: authorization

service:
  extensions: [headers_setter]
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [elasticsearch]
```

I have Elasticsearch running locally, with an "admin" user with the
password "changeme". Sending OTLP/HTTP to the collector with
`telemetrygen traces --otlp-http --otlp-insecure http://localhost:4318
--otlp-header "Authorization=\"Basic YWRtaW46Y2hhbmdlbWU=\""`, I observe
the following:
- Without the `batcher` config, the exporter fails to index data into
Elasticsearch due to an auth error. That's because the exporter is
buffering and dropping the context with client metadata, so there's no
Authorization header attached to the requests going out.
- With `batcher: {enabled: true}`, same behaviour as above. Unlike the
[`batch`
processor](https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor),
the batch sender does not maintain client metadata.
- With `batcher: {enabled: false}`, the exporter successfully indexes
data into Elasticsearch.

**Documentation:**

Updated the README.

---------

Co-authored-by: Carson Ip <[email protected]>
  • Loading branch information
axw and carsonip authored Jul 30, 2024
1 parent 6b7057d commit 91dce71
Show file tree
Hide file tree
Showing 12 changed files with 415 additions and 36 deletions.
29 changes: 29 additions & 0 deletions .chloggen/elasticsearch-sync-bulkindexer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: elasticsearchexporter

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add opt-in support for the experimental `batcher` config

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [32377]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
By enabling (or explicitly disabling) the batcher, the Elasticsearch exporter's
existing batching/buffering logic will be disabled, and the batch sender will be used.
# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
30 changes: 30 additions & 0 deletions exporter/elasticsearchexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,33 @@ All other defaults are as defined by [confighttp].

The Elasticsearch exporter supports the common [`sending_queue` settings][exporterhelper]. However, the sending queue is currently disabled by default.

### Batching

> [!WARNING]
> The `batcher` config is experimental and may change without notice.

The Elasticsearch exporter supports the [common `batcher` settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterbatcher/config.go).

- `batcher`:
- `enabled` (default=unset): Enable batching of requests into a single bulk request.
- `min_size_items` (default=5000): Minimum number of log records / spans in the buffer to trigger a flush immediately.
- `max_size_items` (default=10000): Maximum number of log records / spans in a request.
- `flush_timeout` (default=30s): Maximum time of the oldest item spent inside the buffer, aka "max age of buffer". A flush will happen regardless of the size of content in buffer.

By default, the exporter will perform its own buffering and batching, as configured through the
`flush` config, and `batcher` will be unused. By setting `batcher::enabled` to either `true` or
`false`, the exporter will not perform any of its own buffering or batching, and the `flush` config
will be ignored. In a future release when the `batcher` config is stable, and has feature parity
with the exporter's existing `flush` config, it will be enabled by default.

Using the common `batcher` functionality provides several benefits over the default behavior:
- Combined with a persistent queue, or no queue at all, `batcher` enables at least once delivery.
With the default behavior, the exporter will accept data and process it asynchronously,
which interacts poorly with queuing.
- By ensuring the exporter makes requests to Elasticsearch synchronously,
client metadata can be passed through to Elasticsearch requests,
e.g. by using the [`headers_setter` extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/extension/headerssetterextension/README.md).

### Elasticsearch document routing

Telemetry data will be written to signal specific data streams by default:
Expand Down Expand Up @@ -173,6 +200,9 @@ The behaviour of this bulk indexing can be configured with the following setting
- `max_interval` (default=1m): Max waiting time if a HTTP request failed.
- `retry_on_status` (default=[429, 500, 502, 503, 504]): Status codes that trigger request or document level retries. Request level retry and document level retry status codes are shared and cannot be configured separately. To avoid duplicates, it is recommended to set it to `[429]`. WARNING: The default will be changed to `[429]` in the future.

> [!NOTE]
> The `flush` config will be ignored when `batcher::enabled` config is explicitly set to `true` or `false`.

### Elasticsearch node discovery

The Elasticsearch Exporter will regularly check Elasticsearch for available nodes.
Expand Down
125 changes: 116 additions & 9 deletions exporter/elasticsearchexporter/bulkindexer.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ package elasticsearchexporter // import "github.com/open-telemetry/opentelemetry

import (
"context"
"fmt"
"errors"
"io"
"runtime"
"sync"
Expand Down Expand Up @@ -51,10 +51,103 @@ type bulkIndexerSession interface {
}

func newBulkIndexer(logger *zap.Logger, client *elasticsearch.Client, config *Config) (bulkIndexer, error) {
// TODO: add support for synchronous bulk indexing, to integrate with the exporterhelper batch sender.
if config.Batcher.Enabled != nil {
return newSyncBulkIndexer(logger, client, config), nil
}
return newAsyncBulkIndexer(logger, client, config)
}

func newSyncBulkIndexer(logger *zap.Logger, client *elasticsearch.Client, config *Config) *syncBulkIndexer {
var maxDocRetry int
if config.Retry.Enabled {
// max_requests includes initial attempt
// See https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32344
maxDocRetry = config.Retry.MaxRequests - 1
}
return &syncBulkIndexer{
config: docappender.BulkIndexerConfig{
Client: client,
MaxDocumentRetries: maxDocRetry,
Pipeline: config.Pipeline,
RetryOnDocumentStatus: config.Retry.RetryOnStatus,
},
flushTimeout: config.Timeout,
retryConfig: config.Retry,
logger: logger,
}
}

type syncBulkIndexer struct {
config docappender.BulkIndexerConfig
flushTimeout time.Duration
retryConfig RetrySettings
logger *zap.Logger
}

// StartSession creates a new docappender.BulkIndexer, and wraps
// it with a syncBulkIndexerSession.
func (s *syncBulkIndexer) StartSession(context.Context) (bulkIndexerSession, error) {
bi, err := docappender.NewBulkIndexer(s.config)
if err != nil {
return nil, err
}
return &syncBulkIndexerSession{
s: s,
bi: bi,
}, nil
}

// Close is a no-op.
func (s *syncBulkIndexer) Close(context.Context) error {
return nil
}

type syncBulkIndexerSession struct {
s *syncBulkIndexer
bi *docappender.BulkIndexer
}

// Add adds an item to the sync bulk indexer session.
func (s *syncBulkIndexerSession) Add(_ context.Context, index string, document io.WriterTo) error {
return s.bi.Add(docappender.BulkIndexerItem{Index: index, Body: document})
}

// End is a no-op.
func (s *syncBulkIndexerSession) End() {
// TODO acquire docappender.BulkIndexer from pool in StartSession, release here
}

// Flush flushes documents added to the bulk indexer session.
func (s *syncBulkIndexerSession) Flush(ctx context.Context) error {
var retryBackoff func(int) time.Duration
for attempts := 0; ; attempts++ {
if _, err := flushBulkIndexer(ctx, s.bi, s.s.flushTimeout, s.s.logger); err != nil {
return err
}
if s.bi.Items() == 0 {
// No documents in buffer waiting for per-document retry, exit retry loop.
return nil
}
if retryBackoff == nil {
retryBackoff = createElasticsearchBackoffFunc(&s.s.retryConfig)
if retryBackoff == nil {
// BUG: This should never happen in practice.
// When retry is disabled / document level retry limit is reached,
// documents should go into FailedDocs instead of indexer buffer.
return errors.New("bulk indexer contains documents pending retry but retry is disabled")
}
}
backoff := retryBackoff(attempts + 1) // TODO: use exporterhelper retry_sender
timer := time.NewTimer(backoff)
select {
case <-ctx.Done():
timer.Stop()
return ctx.Err()
case <-timer.C:
}
}
}

func newAsyncBulkIndexer(logger *zap.Logger, client *elasticsearch.Client, config *Config) (*asyncBulkIndexer, error) {
numWorkers := config.NumWorkers
if numWorkers == 0 {
Expand Down Expand Up @@ -215,18 +308,32 @@ func (w *asyncBulkIndexerWorker) run() {

func (w *asyncBulkIndexerWorker) flush() {
ctx := context.Background()
if w.flushTimeout > 0 {
stat, _ := flushBulkIndexer(ctx, w.indexer, w.flushTimeout, w.logger)
w.stats.docsIndexed.Add(stat.Indexed)
}

func flushBulkIndexer(
ctx context.Context,
bi *docappender.BulkIndexer,
timeout time.Duration,
logger *zap.Logger,
) (docappender.BulkIndexerResponseStat, error) {
if timeout > 0 {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(context.Background(), w.flushTimeout)
ctx, cancel = context.WithTimeout(ctx, timeout)
defer cancel()
}
stat, err := w.indexer.Flush(ctx)
w.stats.docsIndexed.Add(stat.Indexed)
stat, err := bi.Flush(ctx)
if err != nil {
w.logger.Error("bulk indexer flush error", zap.Error(err))
logger.Error("bulk indexer flush error", zap.Error(err))
}
for _, resp := range stat.FailedDocs {
w.logger.Error(fmt.Sprintf("Drop docs: failed to index: %#v", resp.Error),
zap.Int("status", resp.Status))
logger.Error(
"failed to index document",
zap.String("index", resp.Index),
zap.String("error.type", resp.Error.Type),
zap.String("error.reason", resp.Error.Reason),
)
}
return stat, err
}
26 changes: 26 additions & 0 deletions exporter/elasticsearchexporter/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (

"go.opentelemetry.io/collector/config/confighttp"
"go.opentelemetry.io/collector/config/configopaque"
"go.opentelemetry.io/collector/exporter/exporterbatcher"
"go.opentelemetry.io/collector/exporter/exporterhelper"
"go.uber.org/zap"
)
Expand Down Expand Up @@ -76,6 +77,31 @@ type Config struct {
// TelemetrySettings contains settings useful for testing/debugging purposes
// This is experimental and may change at any time.
TelemetrySettings `mapstructure:"telemetry"`

// Batcher holds configuration for batching requests based on timeout
// and size-based thresholds.
//
// Batcher is unused by default, in which case Flush will be used.
// If Batcher.Enabled is non-nil (i.e. batcher::enabled is specified),
// then the Flush will be ignored even if Batcher.Enabled is false.
Batcher BatcherConfig `mapstructure:"batcher"`
}

// BatcherConfig holds configuration for exporterbatcher.
//
// This is a slightly modified version of exporterbatcher.Config,
// to enable tri-state Enabled: unset, false, true.
type BatcherConfig struct {
// Enabled indicates whether to enqueue batches before sending
// to the exporter. If Enabled is specified (non-nil),
// then the exporter will not perform any buffering itself.
Enabled *bool `mapstructure:"enabled"`

// FlushTimeout sets the time after which a batch will be sent regardless of its size.
FlushTimeout time.Duration `mapstructure:"flush_timeout"`

exporterbatcher.MinSizeConfig `mapstructure:",squash"`
exporterbatcher.MaxSizeConfig `mapstructure:",squash"`
}

type TelemetrySettings struct {
Expand Down
38 changes: 38 additions & 0 deletions exporter/elasticsearchexporter/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"go.opentelemetry.io/collector/config/confighttp"
"go.opentelemetry.io/collector/config/configopaque"
"go.opentelemetry.io/collector/confmap/confmaptest"
"go.opentelemetry.io/collector/exporter/exporterbatcher"
"go.opentelemetry.io/collector/exporter/exporterhelper"

"github.com/open-telemetry/opentelemetry-collector-contrib/exporter/elasticsearchexporter/internal/metadata"
Expand Down Expand Up @@ -107,6 +108,15 @@ func TestConfig(t *testing.T) {
PrefixSeparator: "-",
DateFormat: "%Y.%m.%d",
},
Batcher: BatcherConfig{
FlushTimeout: 30 * time.Second,
MinSizeConfig: exporterbatcher.MinSizeConfig{
MinSizeItems: 5000,
},
MaxSizeConfig: exporterbatcher.MaxSizeConfig{
MaxSizeItems: 10000,
},
},
},
},
{
Expand Down Expand Up @@ -168,6 +178,15 @@ func TestConfig(t *testing.T) {
PrefixSeparator: "-",
DateFormat: "%Y.%m.%d",
},
Batcher: BatcherConfig{
FlushTimeout: 30 * time.Second,
MinSizeConfig: exporterbatcher.MinSizeConfig{
MinSizeItems: 5000,
},
MaxSizeConfig: exporterbatcher.MaxSizeConfig{
MaxSizeItems: 10000,
},
},
},
},
{
Expand Down Expand Up @@ -229,6 +248,15 @@ func TestConfig(t *testing.T) {
PrefixSeparator: "-",
DateFormat: "%Y.%m.%d",
},
Batcher: BatcherConfig{
FlushTimeout: 30 * time.Second,
MinSizeConfig: exporterbatcher.MinSizeConfig{
MinSizeItems: 5000,
},
MaxSizeConfig: exporterbatcher.MaxSizeConfig{
MaxSizeItems: 10000,
},
},
},
},
{
Expand Down Expand Up @@ -263,6 +291,16 @@ func TestConfig(t *testing.T) {
cfg.Endpoint = "https://elastic.example.com:9200"
}),
},
{
id: component.NewIDWithName(metadata.Type, "batcher_disabled"),
configFile: "config.yaml",
expected: withDefaultConfig(func(cfg *Config) {
cfg.Endpoint = "https://elastic.example.com:9200"

enabled := false
cfg.Batcher.Enabled = &enabled
}),
},
}

for _, tt := range tests {
Expand Down
Loading

0 comments on commit 91dce71

Please sign in to comment.