01 Apr 18:58

jon-wei

11df7c4

druid-0.17.1

Apache Druid 0.17.1 is a security bug fix release that addresses the following CVE for LDAP authentication:

[CVE-2020-1958]: Apache Druid LDAP injection vulnerability (https://lists.apache.org/thread.html/r9d437371793b410f8a8e18f556d52d4bb68e18c537962f6a97f4945e%40%3Cdev.druid.apache.org%3E)

Assets 2

27 Jan 01:14

jon-wei

druid-0.17.0

f37b984

druid-0.17.0

Apache Druid 0.17.0 contains over 250 new features, performance enhancements, bug fixes, and major documentation improvements from 52 contributors. Check out the complete list of changes and everything tagged to the milestone.

Highlights

Batch ingestion improvements

Druid 0.17.0 includes a significant update to the native batch ingestion system. This update adds the internal framework to support non-text binary formats, with initial support for ORC and Parquet. Additionally, native batch tasks can now read data from HDFS.

This rework changes how the ingestion source and data format are specified in the ingestion task. To use the new features, please refer to the documentation on InputSources and InputFormats.

Please see the following documentation for details:
https://druid.apache.org/docs/0.17.0/ingestion/data-formats.html#input-format
https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html#input-sources
https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html#partitionsspec

#8812

Single dimension range partitioning for parallel native batch ingestion

The parallel index task now supports the single_dim type partitions spec, which allows for range-based partitioning on a single dimension.

Please see https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html for details.

Compaction changes

Parallel index task split hints

The parallel indexing task now has a new configuration, splitHintSpec, in the tuningConfig to allow for operators to provide hints to control the amount of data that each first phase subtask reads. There is currently one split hint spec type, SegmentsSplitHintSpec, used for re-ingesting Druid segments.

Parallel auto-compaction

Auto-compaction can now use the parallel indexing task, allowing for greater compaction throughput.

To control the level of parallelism, the auto-compactiontuningConfig has new parameters, maxNumConcurrentSubTasks and splitHintSpec.

Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#compaction-dynamic-configuration for details.

#8570

Stateful auto-compaction

Auto-compaction now uses the partitionSpec to track changes made by previous compaction tasks, allowing the coordinator to reduce redundant compaction operations.

Please see #8489 for details.

If you have auto-compaction enabled, please see the information under "Stateful auto-compaction changes" in the "Upgrading to Druid 0.17.0" section before upgrading.

Parallel query merging on brokers

The Druid broker can now opportunistically merge query results in parallel using multiple threads.

Please see druid.processing.merge.useParallelMergePool in the Broker section of the configuration reference for details on how to configure this new feature.

Parallel merging is enabled by default (controlled by the druid.processing.merge.useParallelMergePool property), and most users should not have to change any of the advanced configuration properties described in the configuration reference.

Additionally, merge parallelism can be controlled on a per-query basis using the query context. Information about the new query context parameters can be found at https://druid.apache.org/docs/0.17.0/querying/query-context.html.

#8578

SQL-compatible null handling

In 0.17.0, we have added official documentation for Druid's SQL-compatible null handling mode.

Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#sql-compatible-null-handling and https://druid.apache.org/docs/0.17.0/design/segments.html#sql-compatible-null-handling for details.

Several bugs that existed in this previously undocumented mode have been fixed, particularly around null handling in numeric columns. We recommend that users begin to consider transitioning their clusters to this new mode after upgrading to 0.17.0.

The full list of null handling bugs fixed in 0.17.0 can be found at https://github.com/apache/druid/issues?utf8=%E2%9C%93&q=label%3A%22Area+-+Null+Handling%22+milestone%3A0.17.0+

LDAP extension

Druid now supports LDAP authentication. Authorization using LDAP groups is also supported by mapping LDAP groups to Druid roles.

LDAP authentication is handled by specifying an LDAP-type credentials validator.
Authorization using LDAP is handled by specifying an LDAP-type role provider, and defining LDAP group->Druid role mappings within Druid.

LDAP integration requires the druid-basic-security core extension. Please see https://druid.apache.org/docs/0.17.0/development/extensions-core/druid-basic-security.html for details.

As this is the first release with LDAP support, and there are a large variety of LDAP ecosystems, some LDAP use cases and features may not be supported yet. Please file an issue if you need enhancements to this new functionality.

#6972

Dropwizard emitter

A new Dropwizard metrics emitter has been added as a contrib extension.

The currently supported Dropwizard metrics types are counter, gauge, meter, timer and histogram. These metrics can be emitted using either a Console or JMX reporter.

Please see https://druid.apache.org/docs/0.17.0/design/extensions-contrib/dropwizard.html for details.

#7363

Self-discovery resource

A new pair of endpoints have been added to all Druid services that return information about whether the Druid service has received a confirmation that the service has been added to the cluster, from the central service discovery mechanism (currently ZooKeeper). These endpoints can be useful as health/ready checks.

The new endpoints are:

/status/selfDiscovered/status
/status/selfDiscovered

Please see the Druid API reference for details.

#6702
#9005

Supervisors system table

Task supervisors (e.g. Kafka or Kinesis supervisors) are now recorded in the system tables in a new sys.supervisors table.

Please see https://druid.apache.org/docs/0.17.0/querying/sql.html#supervisors-table for details.

#8547

Fast historical start with lazy loading

A new boolean configuration property for historicals, druid.segmentCache.lazyLoadOnStart, has been added.

This new property allows historicals to defer loading of a segment until the first time that segment is queried, which can significantly decrease historical startup times for clusters with a large number of segments.

Please see the configuration reference for details.

#6988

Historical segment cache distribution change

A new historical property, druid.segmentCache.locationSelectorStrategy, has been added.

If there are multiple segment storage locations specified in druid.segmentCache.locations, the new locationSelectorStrategy property allows the user to specify what strategy is used to fill the locations. Currently supported options are roundRobin and leastBytesUsed.

Please see the configuration reference for details.

#8038

New readiness endpoints

A new Broker endpoint has been added: /druid/broker/v1/readiness.

A new Historical endpoint has been added: /druid/historical/v1/readiness.

These endpoints are similar to the existing /druid/broker/v1/loadstatus and /druid/historical/v1/loadstatus endpoints.

They differ in that they do not require authentication/authorization checks, and instead of a JSON body they only return a 200 success or 503 HTTP response code.

#8841

Support task assignment based on MiddleManager categories

It is now possible to define a "category" name property for each MiddleManager. New worker select strategies that are category-aware have been added, allowing the user to control how tasks are assigned to MiddleManagers based on the configured categories.

Please see the documentation for druid.worker.category in the configuration reference, and the following links, for more details:
https://druid.apache.org/docs/0.17.0/configuration/index.htmlEqual-Distribution-With-Category-Spec
https://druid.apache.org/docs/0.17.0/configuration/index.html#Fill-Capacity-With-Category-Spec
https://druid.apache.org/docs/0.17.0/configuration/index.html#WorkerCategorySpec

#7066

Security vulnerability updates

A large number of dependencies have been updated to newer versions to address security vulnerabilities.

Please see the PRs below for details:

#8878
#8980

Upgrading to Druid 0.17.0

Select native query has been replaced

The deprecated Select native query type has been removed in 0.17.0.

If you have native queries that use Select, you need to modify them to use Scan instead. See the Scan query documentation (https://druid.apache.org/docs/0.17.0/querying/scan-query.html) for syntax and output format details.

For Druid SQL queries that use Select, no...

Assets 2

11 Dec 07:36

jon-wei

druid-0.16.1-incubating

144bd78

druid-0.16.1-incubating

Apache Druid 0.16.1-incubating is a bug fix and user experience improvement release that fixes a rolling upgrade issue, improves the startup scripts, and updates licensing information.

Bug Fixes
#8682 implement FiniteFirehoseFactory in InlineFirehose
#8905 Retrying with a backward compatible task type on unknown task type error in parallel indexing

User Experience Improvements
#8792 Use bundled ZooKeeper in tutorials.
#8794 Startup scripts: verify Java 8 (exactly), improve port/java verification messages.
#8942 Improve verify-default-ports to check both INADDR_ANY and 127.0.0.1.
#8798 Fix verify script.

Licensing Update
#8944 Add license for tutorial wiki data
#8968 Add licenses.yaml entry for Wikipedia sample data

Other
#8419 Bump Apache Thrift to 0.10.0

Updating from 0.16.0-incubating and earlier

PR #8905 fixes an issue with rolling upgrades when updating from earlier versions.
Credits
Thanks to everyone who contributed to this release!

@aditya-r-m
@clintropolis
@Fokko
@gianm
@jihoonson
@jon-wei

Apache Druid (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Assets 2

25 Sep 00:36

clintropolis

druid-0.16.0-incubating

54d29e4

druid-0.16.0-incubating

Apache Druid 0.16.0-incubating contains over 350 new features, performance enhancements, bug fixes, and major documentation improvements from 50 contributors. Check out the complete list of changes and everything tagged to the milestone.

Highlights

# Performance

# 'Vectorized' query processing

An experimental 'vectorized' query execution engine is new in 0.16.0, which can provide a speed increase in the range of 1.3-3x for timeseries and group by v2 queries. It operates on the principle of batching operations on rows instead of processing a single row at a time, e.g. iterating bitmaps in batches instead of per row, reading column values in batches, filtering in batches, aggregating values in batches, and so on. This results in significantly fewer method calls, better memory locality, and increased cache efficiency.

This is an experimental feature, but we view it as the path forward for Druid query processing and are excited for feedback as we continue to improve and fill out missing features in upcoming releases.

Only timeseries and groupBy have vectorized engines.
GroupBy doesn't handle multi-value dimensions or granularity other than "all" yet.
Vector cursors cannot handle virtual columns or descending order.
Expressions are not supported anywhere: not as inputs to aggregators, in virtual functions, or in filters.
Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs).

The feature can be enabled by setting "vectorize": true your query context (the default is false). This works both for Druid SQL and for native queries. When set to true, vectorization will be used if possible; otherwise, Druid will fall back to its non-vectorized query engine. You can also set it to "force", which will return an error if the query cannot be fully vectorized. This is helpful for confirming that vectorization is indeed being used.

You can control the block size during execution by setting the vectorSize query context parameter (default is 1000).

#7093
#6794

# GroupBy array-based result rows

groupBy v2 queries now use an array-based representation of result rows, rather than the map-based representation used by prior versions of Druid. This provides faster generation and processing of result sets. Out of the box this change is invisible and backwards-compatible; you will not have to change any configuration to reap the benefits of this more efficient format, and it will have no impact on cached results. Internally this format will always be utilized automatically by the broker in the queries that it issues to historicals. By default the results will be translated back to the existing 'map' based format at the broker before sending them back to the client.

However, if you would like to avoid the overhead of this translation, and get even faster results,resultAsArray may be set on the query context to directly pass through the new array based result row format. The schema is as follows, in order:

Timestamp (optional; only if granularity != ALL)
Dimensions (in order)
Aggregators (in order)
Post-aggregators (optional; in order, if present)

#8118
#8196

# Additional performance enhancements

The complete set of pull requests tagged as performance enhancements for 0.16 can be found here.

# "Minor" compaction

Users of the Kafka indexing service and compaction and who get a trickle of late data, can find a huge improvement in the form of a new concept called 'minor' compaction. Enabled by internal changes to how data segments are versioned, minor compaction is based on the idea of 'segment' based locking at indexing time instead of the current Druid locking behavior (which is now referred to as 'time chunk' locking). Segment locking as you might expect allows only the segments which are being compacted to be locked, while still allowing new 'appending' indexing tasks (like Kafka indexing tasks) to continue to run and create new segments, simulataneously. This is a big deal if you get a lot of late data, because the current behavior results in compaction tasks starving as higher priority realtime tasks hog the locks. This prevention of compaction tasks from optimizing the datasources segment sizes results in reduced overall performance.

To enable segment locking, you will need to set forceTimeChunkLock to false in the task context, or set druid.indexer.tasklock.forceTimeChunkLock=false in the Overlord configuration. However, beware, after enabling this feature, due to the changes in segment versioning, there is no rollback path built in, so once you upgrade to 0.16, you cannot downgrade to an older version of Druid. Because of this, we highly recommend confirming that Druid 0.16 is stable in your cluster before enabling this feature.

It has a humble name, but the changes of minor compaction run deep, and it is not possible to adequately describe the mechanisms that drive this in these release notes, so check out the proposal and PR for more details.

#7491
#7547

# Druid "indexer" process

The new Indexer process is an alternative to the MiddleManager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process. The Indexer is designed to be easier to configure and deploy compared to the MiddleManager + Peon system and to better enable resource sharing across tasks.

The advantage of the Indexer is that it allows query processing resources, lookups, cached authentication/authorization information, and much more to be shared between all running indexing task threads, giving each individual task access to a larger pool of resources and far fewer redundant actions done than is possible with the Peon model of execution where each task is isolated in its own process.

Using Indexer does come with one downside: the loss of process isolation provided by Peon processes means that a single task can potentially affect all running indexing tasks on that Indexer. The druid.worker.globalIngestionHeapLimitBytes and druid.worker.numConcurrentMerges configurations are meant to help minimize this. Additionally, task logs for indexer processes will be inline with the Indexer process log, and not persisted to deep storage.

You can start using indexing by supplying server indexer as the command-line argument to org.apache.druid.cli.Main when starting the service. To use Indexer in place of a MiddleManager and Peon, you should be able to adapt values from the configuration into the Indexer configuration, lifting druid.indexer.fork.property. configurations directly to the Indexer, and sizing heap and direct memory based on the Peon sizes multiplied by the number of task slots (unlike a MiddleManager, it does not accept the configurations druid.indexer.runner.javaOpts or druid.indexer.runner.javaOptsArray). See the indexer documentation for details.

#8107

# Native parallel batch indexing with shuffle

In 0.16.0, Druid's index_parallel native parallel batch indexing task now supports 'perfect' rollup with the implementation of a 2 stage shuffle process.

Tasks in stage 1 perform a secondary partitioning of rows on top of the standard time based partitioning of segment granularity, creating an intermediary data segment for each partition. Stage 2 tasks are each assigned a set of the partitionings created during stage 1, and will collect and combine the set of intermediary data segments which belong to that partitioning, allowing it to achieve complete rollup when building the final segments. At this time, only hash-based partitioning is supported.

This can be enabled by setting forceGuaranteedRollup to true in the tuningConfig; numShards in partitionsSpec and intervals in granularitySpec must also be set.

The Druid MiddleManager (or the new Indexer) processes have a new responsibility for these indexing tasks, serving the intermediary partition segments output of stage 1 into the stage 2 tasks, so depending on configuration and cluster size, the MiddleManager jvm configuration might need to be adjusted to increase heap allocation and http threads. These numbers are expected to scale with cluster size, as all MiddleManager or Indexer processes involved in a shuffle will need the ability to communicate with each other, but we do not expect the footprint to be significantly larger than it is currently. Optimistically we suggest trying with your existing configurations, and bumping up heap and http thread count only if issues are encountered.

#8061

#...

Assets 2

16 Aug 00:49

clintropolis

druid-0.15.1-incubating

c698daa

druid-0.15.1-incubating

Apache Druid 0.15.1-incubating is a bug fix release that includes important fixes for Apache Zookeeper based segment loading, the 'druid-datasketches' extension, and much more.

Bug Fixes

Coordinator

#8137 coordinator throwing exception trying to load segments (fixed by #8140)

Middlemanager

#7886 Middlemanager fails startup due to corrupt task files (fixed by #7917)
#8085 fix forking task runner task shutdown to be more graceful

Queries

#7777 timestamp_ceil function is either wrong or misleading (fixed by #7823)
#7820 subtotalsSpec and filtering returns no results (fixed by #7827)
#8013 Fix ExpressionVirtualColumn capabilities; fix groupBy's improper uses of StorageAdapter#getColumnCapabilities.

API

#6786 apache-druid-0.13.0-incubating router /druid/router/v1/brokers (fixed by #8026)
#8044 SupervisorManager: Add authorization checks to bulk endpoints.

Metrics Emitters

#8204 HttpPostEmitter throw Class cast exception when using emitAndReturnBatch (fixed by #8205)

Extensions

Datasketches

#7666 sketches-core-0.13.4
#8055 force native order when wrapping ByteBuffer

Kinesis Indexing Service

#7830 Kinesis: Fix getPartitionIds, should be checking isHasMoreShards.

Moving Average Query

#7999 Druid moving average query results in circular reference error (fixed by #8192)

Documentation Fixes

#8002 Improve pull-deps reference in extensions page.
#8003 Add missing reference to Materialized-View extension.
#8079 Fix documentation formatting
#8087 fix references to bin/supervise in tutorial docs

Updating from 0.15.0-incubating and earlier

Due to issue #8137, when updating from specifically 0.15.0-incubating to 0.15.1-incubating, it is recommended to update the Coordinator before the Historical servers to prevent segment unavailability during an upgrade (this is typically reversed). Upgrading from any version older than 0.15.0-incubating does not have these concerns and can be done normally.

Known Issues

Building Docker images is currently broken and will be fixed in the next release, see #8054 which is fixed by #8237 for more details.

Credits

Thanks to everyone who contributed to this release!

@AlexanderSaydakov
@ArtyomyuS
@ccl0326
@clintropolis
@gianm
@himanshug
@jihoonson
@legoscia
@leventov
@pjain1
@yurmix
@xueyumusic

Assets 2

27 Jun 18:36

jihoonson

druid-0.15.0-incubating

44c9323

druid-0.15.0-incubating

Apache Druid 0.15.0-incubating contains over 250 new features, performance/stability/documentation improvements, and bug fixes from 39 contributors. Major new features and improvements include:

New Data Loader UI
Support transactional Kafka topic
New Moving Average query
Time ordering for Scan query
New Moments Sketch aggregator
SQL enhancements
Light lookup module for routers
Core ORC extension
Core GCP extension
Document improvements

The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Aclosed+milestone%3A0.15.0

Documentation for this release is at: http://druid.apache.org/docs/0.15.0-incubating/

Highlights

New Data Loader UI (Batch indexing part)

Druid has a new Data Loader UI which is integrated with the Druid Console. The new Data Loader UI shows some sampled data to easily verify the ingestion spec and generates the final ingestion spec automatically. The users are expected to easily issue batch index tasks instead of writing a JSON spec by themselves.

Added by @vogievetsky and @dclim in #7572 and #7531, respectively.

Support Kafka Transactional Topics

The Kafka indexing service now supports Kafka Transactional Topics.

Please note that only Kafka 0.11.0 or later versions are supported after this change.

Added by @surekhasaharan in #6496.

New Moving Average Query

A new query type was introduced to compute moving average.

Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/moving-average-query.html for more details.

Added by @yurmix in #6430.

Time Ordering for Scan Query

The Scan query type now supports time ordering. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/scan-query.html#time-ordering for more details.

Added by @justinborromeo in #7133.

New Moments Sketch Aggregator

The Moments Sketch is a new sketch type for approximate quantile computation. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/momentsketch-quantiles.html for more details.

Added by @edgan8 in #6581.

SQL enhancements

Druid community has been striving to enhance SQL support and now it's no longer experimental.

New SQL functions

LPAD and RPAD functions were added by @xueyumusic in #7388.
DEGREES and RADIANS functions were added by @xueyumusic in #7336.
STRING_FORMAT function was added by @gianm in #7327.
PARSE_LONG function was added by @gianm in #7326.
ROUND function was added by @gianm in #7224.
Trigonometric functions were added by @xueyumusic in #7182.

Autocomplete in Druid Console

Druid Console now supports autocomplete for SQL.

Added by @shuqi7 in #7244.

Time-ordered scan support for SQL

Druid SQL supports time-ordered scan query.

Added by @justinborromeo in #7373.

Lookups view added to the web console

You can now configure your lookups from the web console directly.

Added by @shuqi7 in #7259.

Misc web console improvements

"NoSQL" mode : #7493 [@shuqi7]

The web console now has a backup mode that allows it to function as best as it can if DruidSQL is disabled or unavailable.

Added compaction configuration dialog : #7242 [@shuqi7]

You can now configure the auto compaction settings for a data source from the Datasource view.

Auto wrap query with limit : #7449 [@vogievetsky]

The console query view will now (by default) wrap DruidSQL queries with a SELECT * FROM (...) LIMIT 1000 allowing you to enter queries like SELECT * FROM your_table without worrying about the impact to the cluster. You can still send 'raw' queries by selecting the option from the ... menu.

SQL explain query : #7402 [@shuqi7]

You can now click on the ... menu in the query view to get an explanation of the DruidSQL query.

Surface `is_overshadowed` as a column in the segments table #7555 , #7425 [@shuqi7][@surekhasaharan]

is_overshadowed column represents that this segment is overshadowed by any published segments. It can be useful to see what segments should be loaded by historicals. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/sql.html for more details.

Improved status UI for actions on tasks, supervisors, and datasources : #7528 [shuqi7]

This PR condenses the actions list into a tidy menu and lets you see the detailed status for supervisors and tasks. New actions for datasources around loading and dropping data by interval has also been added.

Light Lookup Module for Routers

Light lookup module was introduced for Routers and they now need only minimum amount of memory. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html#router for basic memory tuning.

Added by @clintropolis in #7222.

Core ORC extension

ORC extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the ORC extension in an earlier version of Druid.

Added by @clintropolis in #7138.

Core GCP extension

GCP extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the GCP extension in an earlier version of Druid.

Added by @drcrallen in #6953.

Document Improvements

Single-machine deployment example configurations and scripts

Several configurations and scripts were added for easy single machine setup. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/single-server.html for details.

Added by @jon-wei in #7590.

Tool for migrating from local deep storage/Derby metadata

A new tool was added for easy migration from single machine to a cluster environment. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/deep-storage-migration.html for details.

Added by @jon-wei in #7598.

Document for basic tuning guide

Documents for basic tuning guide was added. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html for details.

Added by @jon-wei in #7629.

Security Improvement

The Druid system table now requires only mandatory permissions instead of the read permission for the whole sys database. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-core/druid-basic-security.html for details.

Added by @jon-wei in #7579.

Deprecated/removed

Drop support for automatic segment merge

The automatic segment merge by the coordinator is not supported anymore. Please use auto compaction instead.

Added by @jihoonson in #6883.

Drop support for `insert-segment-to-db` tool

In Druid 0.14.x or earlier, Druid stores segment metadata (descriptor.json file) in deep storage in addition to metadata store. This behavior has changed in 0.15.0 and it doesn't store segment metadata file in deep storage anymore. As a result, insert-segment-to-db tool is no longer supported as well since it works based on descriptor.json files in deep storage. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/insert-segment-db.html for details.

Please note that kill task will fail if you're using HDFS as deep storage and descriptor.json file is missing in 0.14.x or earlier versions.

Added by @jihoonson in #6911.

Removed "useFallback" configuration for SQL

This option was removed since it generates unscalable query plans and doesn't work with some SQL functions.

Added by @gianm in #7567.

Removed a public API in `Co...

Assets 2

27 May 21:05

clintropolis

druid-0.14.2-incubating

1053684

druid-0.14.2-incubating

Apache Druid 0.14.2-incubating is a bug fix release that includes important fixes for the 'druid-datasketches' extension and the broker 'result' level caching.

Bug Fixes

#7607 thetaSketch(with sketches-core-0.13.1) in groupBy always return value no more than 16384
#6483 Exception during sketch aggregations while using Result level cache
#7621 NPE when both populateResultLevelCache and grandTotal are set

Credits

Thanks to everyone who contributed to this release!

@AlexanderSaydakov
@clintropolis
@jihoonson
@jon-wei

Assets 2

09 May 04:04

clintropolis

druid-0.14.1-incubating

e8a8816

druid-0.14.1-incubating

Apache Druid 0.14.1-incubating is a small patch release that includes a handful of bug and documentation fixes from 16 contributors.

Important Notice

This release fixes an issue with druid-datasketches extension with quantile sketches, but introduces another one with theta sketches that was confirmed after the release was finalized, caused by #7320 and described in #7607. If you utilize theta sketches, we recommend not upgrading to this release. This will be fixed in the next release of Druid by #7619.

Bug Fixes

use latest sketches-core-0.13.1 #7320
Adjust BufferAggregator.get() impls to return copies #7464
DoublesSketchComplexMetricSerde: Handle empty strings. #7429
handle empty sketches #7526
Adds backwards-compatible serde for SeekableStreamStartSequenceNumbers. #7512
Support Kafka supervisor adopting running tasks between versions #7212
Fix time-extraction topN with non-STRING outputType. #7257
Fix two issues with Coordinator -> Overlord communication. #7412
refactor druid-bloom-filter aggregators #7496
Fix encoded taskId check in chatHandlerResource #7520
Fix too many dentry cache slab objs#7508. #7509
Fix result-level cache for queries #7325
Fix flattening Avro Maps with Utf8 keys #7258
Write null byte when indexing numeric dimensions with Hadoop #7020
Batch hadoop ingestion job doesn't work correctly with custom segments table #7492
Fix aggregatorFactory meta merge exception #7504

Documentation Changes

Fix broken link due to Typo. #7513
Some docs optimization #6890
Updated Javascript Affinity config docs #7441
fix expressions docs operator table #7420
Fix conflicting information in configuration doc #7299
Add missing doc link for operations/http-compression.html #7110

Updating from 0.14.0-incubating and earlier

Kafka Ingestion

Updating from version 0.13.0-incubating or earlier directly to 0.14.1-incubating will not require downtime like the migration path to 0.14.0-incubating due to the issue described in #6958, which has been fixed for this release in #7212. Likewise, rolling updates from version 0.13.0-incubating and earlier should also work properly due to #7512.

Native Parallel Ingestion

Updating from 0.13.0-incubating directly to 0.14.1-incubating will not encounter any issues during a rolling update with mixed versions of middle managers due to the fixes in #7520, as could be experienced when updating to 0.14.0-incubating.

Credits

Thanks to everyone who contributed to this release!

@AlexanderSaydakov
@b-slim
@benhopp
@chrishardis
@clintropolis
@ferristseng
@es1220
@gianm
@jihoonson
@jon-wei
@justinborromeo
@kaka11chen
@samarthjain
@surekhasaharan
@zhaojiandong
@zhztheplayer

Assets 2

09 Apr 21:09

jon-wei

druid-0.14.0-incubating-rc3

f169ada

druid-0.14.0-incubating-rc3

[maven-release-plugin] prepare release druid-0.14.0-incubating

Assets 2

09 Apr 21:12

jon-wei

druid-0.14.0-incubating

f169ada

druid-0.14.0-incubating

Apache Druid (incubating) 0.14.0-incubating contains over 200 new features, performance/stability/documentation improvements, and bug fixes from 54 contributors. Major new features and improvements include:

New web console
Amazon Kinesis indexing service
Decommissioning mode for Historicals
Published segment cache in Broker
Bloom filter aggregator and expression
Updated Apache Parquet extension
Force push down option for nested GroupBy queries
Better segment handoff and drop rule handling
Automatically kill MapReduce jobs when Apache Hadoop ingestion tasks are killed
DogStatsD tag support for statsd emitter
New API for retrieving all lookup specs
New compaction options
More efficient cachingCost segment balancing strategy

The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Amerged+milestone%3A0.14.0

Documentation for this release is at: http://druid.io/docs/0.14.0-incubating/

Highlights

New web console

Druid has a new web console that provides functionality that was previously split between the coordinator and overlord consoles.

The new console allows the user to manage datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console.

For more details, please see http://druid.io/docs/0.14.0-incubating/operations/management-uis.html

Added by @vogievetsky in #6923.

Kinesis indexing service

Druid now supports ingestion from Kinesis streams, provided by the new druid-kinesis-indexing-service core extension.

Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/kinesis-ingestion.html for details.

Added by @jsun98 in #6431.

Decommissioning mode for Historicals

Historical processes can now be put into a "decommissioning" mode, where the coordinator will no longer consider the Historical process as a target for segment replication. The coordinator will also move segments off the decommissioning Historical.

This is controlled via Coordinator dynamic configuration. For more details, please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#dynamic-configuration.

Added by @egor-ryashin in #6349.

Published segment cache on Broker

The Druid Broker now has the ability to maintain a cache of published segments via polling the Coordinator, which can significantly improve response time for metadata queries on the sys.segments system table.

Please see http://druid.io/docs/0.14.0-incubating/querying/sql.html#retrieving-metadata for details.

Added by @surekhasaharan in #6901

Bloom filter aggregator and expression

A new aggregator for constructing Bloom filters at query time and support for performing Bloom filter checks within Druid expressions have been added to the druid-bloom-filter extension.

Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/bloom-filter.html

Added by @clintropolis in #6904 and #6397

Updated Parquet extension

druid-extensions-parquet has been moved into the core extension set from the contrib extensions and now supports flattening and int96 values.

Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/parquet.html for details.

Added by @clintropolis in #6360

Force push down option for nested GroupBy queries

Outer query execution for nested GroupBy queries can now be pushed down to Historical processes; previously, the outer queries would always be executed on the Broker.

Please see #5471 for details.

Added by @samarthjain in #5471.

Better segment handoff and retention rule handling

Segment handoff will now ignore segments that would be dropped by a datasource's retention rules, avoiding ingestion failures caused by issue #5868.

Period load rules will now include the future by default.

A new "Period Drop Before" rule has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/rule-configuration.html#period-drop-before-rule for details.

Added by @QiuMM in #6676, #6414, and #6415.

Automatically kill MapReduce jobs when Hadoop ingestion tasks are killed

Druid will now automatically terminate MapReduce jobs created by Hadoop batch ingestion tasks when the ingestion task is killed.

Added by @ankit0811 in #6828.

DogStatsD tag support for statsd-emitter

The statsd-emitter extension now supports DogStatsD-style tags. Please see http://druid.io/docs/0.14.0-incubating/development/extensions-contrib/statsd.html

Added by @deiwin in #6605, with support for constant tags added by @glasser in #6791.

New API for retrieving all lookup specs

A new API for retrieving all lookup specs for all tiers has been added. Please see http://druid.io/docs/0.14.0-incubating/querying/lookups.html#get-all-lookups for details.

Added by @jihoonson in #7025.

New compaction options

Auto-compaction now supports the maxRowsPerSegment option. Please see http://druid.io/docs/0.14.0-incubating/design/coordinator.html#compacting-segments for details.

The compaction task now supports a new segmentGranularity option, deprecating the older keepSegmentGranularity option for controlling the segment granularity of compacted segments. Please see the segmentGranularity table in http://druid.io/docs/0.14.0-incubating/ingestion/compaction.html for more information on these properties.

Added by @jihoonson in #6758 and #6780.

More efficient cachingCost segment balancing strategy

The cachingCost Coordinator segment balancing strategy will now only consider Historical processes for balancing decisions. Previously the strategy would unnecessarily consider active worker tasks as well, which are not targets for segment replication.

Added by @QiuMM in #6879.

New metrics:

New allocation rate metric jvm/heapAlloc/bytes, added by @egor-ryashin in #6710.
New query count metric query/count, added by @QiuMM in #6473.
SQL query metrics sqlQuery/bytes and sqlQuery/time, added by @gaodayue in #6302.
Apache Kafka ingestion lag metrics ingest/kafka/maxLag and ingest/kafka/avgLag, added by @QiuMM in #6587
Task count metrics task/success/count, task/failed/count, task/running/count, task/pending/count, task/waiting/count, added by @QiuMM in #6657

New interfaces for extension developers

RequestLogEvent

It is now possible to control the fields in RequestLogEvent, emitted by EmittingRequestLogger. Please see #6477 for details. Added by @leventov.

Custom TLS certificate checks

An extension point for custom TLS certificate checks has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/tls-support.html#custom-tls-certificate-checks for details. Added by @jon-wei in #6432.

Kafka Indexing Service no longer experimental

The Kafka Indexing Service extension has been moved out of experimental status.

SQL Enhancements

Enhancements to dsql

The dsql command line client now supports CLI history, basic autocomplete, and specifying query timeouts in the query context.

Added in #6929 by @gianm.

Add SQL id, request logs, and metrics

SQL queries now have an ID, and native queries executed as part of a SQL query will have the associated SQL query ID in the native query's request logs. SQL queries will now be logged in the request logs.

Two new metrics, sqlQuery/time and sqlQuery/bytes, are now emitted for SQL queries.

Please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#request-logging and http://druid.io/docs/0.14.0-incubating/querying/sql.html#sql-metrics for details.

Added by @gaodayue in #6302

More SQL aggregator support

The follow aggregators are now supported in SQL:

DataSketches HLL sketch
DataSketches Theta sketch
DataSketches quantiles sketch
Fixed bins histogram
Bloom filter aggregator

Added by @jon-wei in #6951 and @clintropolis in #6502

Other SQL enhancements

SQL: Add support for queries with project-after-semijoin. #6756
SQL: Support for selecting multi-value dimensions. #6462
SQL: Support AVG on system tables. #601
SQL: Add "POSITION" function. #6596
SQL: Set INFORMATION_SCHEMA catalog name to "druid". #6595
SQL: Fix ordering of sort, sortProject in DruidSemiJoin. #6769

Added by @gianm.

Updating from 0.13.0-incubating and earlier

Kafka ingestion downtime when upgrading

Due to the issue described in #6958, existing Kafka indexing tasks can be terminated unnecessarily during a rolling upgrade of the Overlord. The terminated tasks will be restarted by the Overlord and will function correctly after the initial restart.

Parquet extension changes

The druid-parquet-extensions extension has been moved from contrib to core. When deploying 0.14.0-incubating, please ensure that your extensions-contrib directory does not have any older versions of the Parquet extension.

Additionally, there are now two styles of Parquet parsers in the extension:

parquet-avro: Converts Parquet to Avro, and then parses the Avro representation. This was the existing parser prior to 0.14.0-incubating.
parquet: A new parser that parses the Parquet format directly. Only this new parser supports int96 values.

Prior to 0.14.0-incubating, a specifying a parquet type parser would have a task use the Avro-converting parser. In 0.14.0-incubating, to continue using the Avro-converting parser, you will need to update your ingestion specs to use parquet-avro instead.

The inputFormat field in the inputSpec for tasks using Parquet ...

Assets 2

Releases: apache/druid

druid-0.17.1

Uh oh!

druid-0.17.0

Highlights

Batch ingestion improvements

Single dimension range partitioning for parallel native batch ingestion

Compaction changes

Parallel index task split hints

Parallel auto-compaction

Stateful auto-compaction

Parallel query merging on brokers

SQL-compatible null handling

LDAP extension

Dropwizard emitter

Self-discovery resource

Supervisors system table

Fast historical start with lazy loading

Historical segment cache distribution change

New readiness endpoints

Support task assignment based on MiddleManager categories

Security vulnerability updates

Upgrading to Druid 0.17.0

Select native query has been replaced

Uh oh!

druid-0.16.1-incubating

Uh oh!

druid-0.16.0-incubating

Highlights

# Performance

# 'Vectorized' query processing

# GroupBy array-based result rows

# Additional performance enhancements

# "Minor" compaction

# Druid "indexer" process

# Native parallel batch indexing with shuffle

#...

Uh oh!

druid-0.15.1-incubating

Bug Fixes

Coordinator

Middlemanager

Queries

API

Metrics Emitters

Extensions

Datasketches

Kinesis Indexing Service

Moving Average Query

Documentation Fixes

Updating from 0.15.0-incubating and earlier

Known Issues

Credits

Uh oh!

druid-0.15.0-incubating

Highlights

New Data Loader UI (Batch indexing part)

Support Kafka Transactional Topics

New Moving Average Query

Time Ordering for Scan Query

New Moments Sketch Aggregator

SQL enhancements

New SQL functions

Autocomplete in Druid Console

Time-ordered scan support for SQL

Lookups view added to the web console

Misc web console improvements

"NoSQL" mode : #7493 [@shuqi7]

Added compaction configuration dialog : #7242 [@shuqi7]

Auto wrap query with limit : #7449 [@vogievetsky]

SQL explain query : #7402 [@shuqi7]

Surface is_overshadowed as a column in the segments table #7555 , #7425 [@shuqi7][@surekhasaharan]

Improved status UI for actions on tasks, supervisors, and datasources : #7528 [shuqi7]

Light Lookup Module for Routers

Core ORC extension

Core GCP extension

Document Improvements

Single-machine deployment example configurations and scripts

Tool for migrating from local deep storage/Derby metadata

Document for basic tuning guide

Surface `is_overshadowed` as a column in the segments table #7555 , #7425 [@shuqi7][@surekhasaharan]

Drop support for `insert-segment-to-db` tool