Releases: apache/druid
druid-0.17.1
Apache Druid 0.17.1 is a security bug fix release that addresses the following CVE for LDAP authentication:
- [CVE-2020-1958]: Apache Druid LDAP injection vulnerability (https://lists.apache.org/thread.html/r9d437371793b410f8a8e18f556d52d4bb68e18c537962f6a97f4945e%40%3Cdev.druid.apache.org%3E)
druid-0.17.0
Apache Druid 0.17.0 contains over 250 new features, performance enhancements, bug fixes, and major documentation improvements from 52 contributors. Check out the complete list of changes and everything tagged to the milestone.
Highlights
Batch ingestion improvements
Druid 0.17.0 includes a significant update to the native batch ingestion system. This update adds the internal framework to support non-text binary formats, with initial support for ORC and Parquet. Additionally, native batch tasks can now read data from HDFS.
This rework changes how the ingestion source and data format are specified in the ingestion task. To use the new features, please refer to the documentation on InputSources and InputFormats.
Please see the following documentation for details:
https://druid.apache.org/docs/0.17.0/ingestion/data-formats.html#input-format
https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html#input-sources
https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html#partitionsspec
Single dimension range partitioning for parallel native batch ingestion
The parallel index task now supports the single_dim
type partitions spec, which allows for range-based partitioning on a single dimension.
Please see https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html for details.
Compaction changes
Parallel index task split hints
The parallel indexing task now has a new configuration, splitHintSpec
, in the tuningConfig
to allow for operators to provide hints to control the amount of data that each first phase subtask reads. There is currently one split hint spec type, SegmentsSplitHintSpec
, used for re-ingesting Druid segments.
Parallel auto-compaction
Auto-compaction can now use the parallel indexing task, allowing for greater compaction throughput.
To control the level of parallelism, the auto-compactiontuningConfig
has new parameters, maxNumConcurrentSubTasks
and splitHintSpec
.
Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#compaction-dynamic-configuration for details.
Stateful auto-compaction
Auto-compaction now uses the partitionSpec to track changes made by previous compaction tasks, allowing the coordinator to reduce redundant compaction operations.
Please see #8489 for details.
If you have auto-compaction enabled, please see the information under "Stateful auto-compaction changes" in the "Upgrading to Druid 0.17.0" section before upgrading.
Parallel query merging on brokers
The Druid broker can now opportunistically merge query results in parallel using multiple threads.
Please see druid.processing.merge.useParallelMergePool
in the Broker section of the configuration reference for details on how to configure this new feature.
Parallel merging is enabled by default (controlled by the druid.processing.merge.useParallelMergePool
property), and most users should not have to change any of the advanced configuration properties described in the configuration reference.
Additionally, merge parallelism can be controlled on a per-query basis using the query context. Information about the new query context parameters can be found at https://druid.apache.org/docs/0.17.0/querying/query-context.html.
SQL-compatible null handling
In 0.17.0, we have added official documentation for Druid's SQL-compatible null handling mode.
Please see https://druid.apache.org/docs/0.17.0/configuration/index.html#sql-compatible-null-handling and https://druid.apache.org/docs/0.17.0/design/segments.html#sql-compatible-null-handling for details.
Several bugs that existed in this previously undocumented mode have been fixed, particularly around null handling in numeric columns. We recommend that users begin to consider transitioning their clusters to this new mode after upgrading to 0.17.0.
The full list of null handling bugs fixed in 0.17.0 can be found at https://github.com/apache/druid/issues?utf8=%E2%9C%93&q=label%3A%22Area+-+Null+Handling%22+milestone%3A0.17.0+
LDAP extension
Druid now supports LDAP authentication. Authorization using LDAP groups is also supported by mapping LDAP groups to Druid roles.
- LDAP authentication is handled by specifying an LDAP-type credentials validator.
- Authorization using LDAP is handled by specifying an LDAP-type role provider, and defining LDAP group->Druid role mappings within Druid.
LDAP integration requires the druid-basic-security
core extension. Please see https://druid.apache.org/docs/0.17.0/development/extensions-core/druid-basic-security.html for details.
As this is the first release with LDAP support, and there are a large variety of LDAP ecosystems, some LDAP use cases and features may not be supported yet. Please file an issue if you need enhancements to this new functionality.
Dropwizard emitter
A new Dropwizard metrics emitter has been added as a contrib extension.
The currently supported Dropwizard metrics types are counter, gauge, meter, timer and histogram. These metrics can be emitted using either a Console or JMX reporter.
Please see https://druid.apache.org/docs/0.17.0/design/extensions-contrib/dropwizard.html for details.
Self-discovery resource
A new pair of endpoints have been added to all Druid services that return information about whether the Druid service has received a confirmation that the service has been added to the cluster, from the central service discovery mechanism (currently ZooKeeper). These endpoints can be useful as health/ready checks.
The new endpoints are:
/status/selfDiscovered/status
/status/selfDiscovered
Please see the Druid API reference for details.
Supervisors system table
Task supervisors (e.g. Kafka or Kinesis supervisors) are now recorded in the system tables in a new sys.supervisors
table.
Please see https://druid.apache.org/docs/0.17.0/querying/sql.html#supervisors-table for details.
Fast historical start with lazy loading
A new boolean configuration property for historicals, druid.segmentCache.lazyLoadOnStart
, has been added.
This new property allows historicals to defer loading of a segment until the first time that segment is queried, which can significantly decrease historical startup times for clusters with a large number of segments.
Please see the configuration reference for details.
Historical segment cache distribution change
A new historical property, druid.segmentCache.locationSelectorStrategy
, has been added.
If there are multiple segment storage locations specified in druid.segmentCache.locations
, the new locationSelectorStrategy
property allows the user to specify what strategy is used to fill the locations. Currently supported options are roundRobin
and leastBytesUsed
.
Please see the configuration reference for details.
New readiness endpoints
A new Broker endpoint has been added: /druid/broker/v1/readiness
.
A new Historical endpoint has been added: /druid/historical/v1/readiness
.
These endpoints are similar to the existing /druid/broker/v1/loadstatus
and /druid/historical/v1/loadstatus
endpoints.
They differ in that they do not require authentication/authorization checks, and instead of a JSON body they only return a 200 success or 503 HTTP response code.
Support task assignment based on MiddleManager categories
It is now possible to define a "category" name property for each MiddleManager. New worker select strategies that are category-aware have been added, allowing the user to control how tasks are assigned to MiddleManagers based on the configured categories.
Please see the documentation for druid.worker.category
in the configuration reference, and the following links, for more details:
https://druid.apache.org/docs/0.17.0/configuration/index.htmlEqual-Distribution-With-Category-Spec
https://druid.apache.org/docs/0.17.0/configuration/index.html#Fill-Capacity-With-Category-Spec
https://druid.apache.org/docs/0.17.0/configuration/index.html#WorkerCategorySpec
Security vulnerability updates
A large number of dependencies have been updated to newer versions to address security vulnerabilities.
Please see the PRs below for details:
Upgrading to Druid 0.17.0
Select native query has been replaced
The deprecated Select native query type has been removed in 0.17.0.
If you have native queries that use Select, you need to modify them to use Scan instead. See the Scan query documentation (https://druid.apache.org/docs/0.17.0/querying/scan-query.html) for syntax and output format details.
For Druid SQL queries that use Select, no...
druid-0.16.1-incubating
Apache Druid 0.16.1-incubating is a bug fix and user experience improvement release that fixes a rolling upgrade issue, improves the startup scripts, and updates licensing information.
Bug Fixes
#8682 implement FiniteFirehoseFactory in InlineFirehose
#8905 Retrying with a backward compatible task type on unknown task type error in parallel indexing
User Experience Improvements
#8792 Use bundled ZooKeeper in tutorials.
#8794 Startup scripts: verify Java 8 (exactly), improve port/java verification messages.
#8942 Improve verify-default-ports to check both INADDR_ANY and 127.0.0.1.
#8798 Fix verify script.
Licensing Update
#8944 Add license for tutorial wiki data
#8968 Add licenses.yaml entry for Wikipedia sample data
Other
#8419 Bump Apache Thrift to 0.10.0
Updating from 0.16.0-incubating and earlier
PR #8905 fixes an issue with rolling upgrades when updating from earlier versions.
Credits
Thanks to everyone who contributed to this release!
@aditya-r-m
@clintropolis
@Fokko
@gianm
@jihoonson
@jon-wei
Apache Druid (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
druid-0.16.0-incubating
Apache Druid 0.16.0-incubating contains over 350 new features, performance enhancements, bug fixes, and major documentation improvements from 50 contributors. Check out the complete list of changes and everything tagged to the milestone.
Highlights
# Performance
# 'Vectorized' query processing
An experimental 'vectorized' query execution engine is new in 0.16.0, which can provide a speed increase in the range of 1.3-3x for timeseries and group by v2 queries. It operates on the principle of batching operations on rows instead of processing a single row at a time, e.g. iterating bitmaps in batches instead of per row, reading column values in batches, filtering in batches, aggregating values in batches, and so on. This results in significantly fewer method calls, better memory locality, and increased cache efficiency.
This is an experimental feature, but we view it as the path forward for Druid query processing and are excited for feedback as we continue to improve and fill out missing features in upcoming releases.
- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions or granularity other than "all" yet.
- Vector cursors cannot handle virtual columns or descending order.
- Expressions are not supported anywhere: not as inputs to aggregators, in virtual functions, or in filters.
- Only some aggregators have vectorized implementations: "count", "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Only some filters have vectorized matchers: "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
- Dimension specs other than "default" don't work yet (no extraction functions or filtered dimension specs).
The feature can be enabled by setting "vectorize": true
your query context (the default is false
). This works both for Druid SQL and for native queries. When set to true
, vectorization will be used if possible; otherwise, Druid will fall back to its non-vectorized query engine. You can also set it to "force"
, which will return an error if the query cannot be fully vectorized. This is helpful for confirming that vectorization is indeed being used.
You can control the block size during execution by setting the vectorSize
query context parameter (default is 1000
).
# GroupBy array-based result rows
groupBy v2 queries now use an array-based representation of result rows, rather than the map-based representation used by prior versions of Druid. This provides faster generation and processing of result sets. Out of the box this change is invisible and backwards-compatible; you will not have to change any configuration to reap the benefits of this more efficient format, and it will have no impact on cached results. Internally this format will always be utilized automatically by the broker in the queries that it issues to historicals. By default the results will be translated back to the existing 'map' based format at the broker before sending them back to the client.
However, if you would like to avoid the overhead of this translation, and get even faster results,resultAsArray
may be set on the query context to directly pass through the new array based result row format. The schema is as follows, in order:
- Timestamp (optional; only if granularity != ALL)
- Dimensions (in order)
- Aggregators (in order)
- Post-aggregators (optional; in order, if present)
# Additional performance enhancements
The complete set of pull requests tagged as performance enhancements for 0.16 can be found here.
# "Minor" compaction
Users of the Kafka indexing service and compaction and who get a trickle of late data, can find a huge improvement in the form of a new concept called 'minor' compaction. Enabled by internal changes to how data segments are versioned, minor compaction is based on the idea of 'segment' based locking at indexing time instead of the current Druid locking behavior (which is now referred to as 'time chunk' locking). Segment locking as you might expect allows only the segments which are being compacted to be locked, while still allowing new 'appending' indexing tasks (like Kafka indexing tasks) to continue to run and create new segments, simulataneously. This is a big deal if you get a lot of late data, because the current behavior results in compaction tasks starving as higher priority realtime tasks hog the locks. This prevention of compaction tasks from optimizing the datasources segment sizes results in reduced overall performance.
To enable segment locking, you will need to set forceTimeChunkLock
to false
in the task context, or set druid.indexer.tasklock.forceTimeChunkLock=false
in the Overlord configuration. However, beware, after enabling this feature, due to the changes in segment versioning, there is no rollback path built in, so once you upgrade to 0.16, you cannot downgrade to an older version of Druid. Because of this, we highly recommend confirming that Druid 0.16 is stable in your cluster before enabling this feature.
It has a humble name, but the changes of minor compaction run deep, and it is not possible to adequately describe the mechanisms that drive this in these release notes, so check out the proposal and PR for more details.
# Druid "indexer" process
The new Indexer process is an alternative to the MiddleManager + Peon task execution system. Instead of forking a separate JVM process per-task, the Indexer runs tasks as separate threads within a single JVM process. The Indexer is designed to be easier to configure and deploy compared to the MiddleManager + Peon system and to better enable resource sharing across tasks.
The advantage of the Indexer is that it allows query processing resources, lookups, cached authentication/authorization information, and much more to be shared between all running indexing task threads, giving each individual task access to a larger pool of resources and far fewer redundant actions done than is possible with the Peon model of execution where each task is isolated in its own process.
Using Indexer does come with one downside: the loss of process isolation provided by Peon processes means that a single task can potentially affect all running indexing tasks on that Indexer. The druid.worker.globalIngestionHeapLimitBytes
and druid.worker.numConcurrentMerges
configurations are meant to help minimize this. Additionally, task logs for indexer processes will be inline with the Indexer process log, and not persisted to deep storage.
You can start using indexing by supplying server indexer
as the command-line argument to org.apache.druid.cli.Main
when starting the service. To use Indexer in place of a MiddleManager and Peon, you should be able to adapt values from the configuration into the Indexer configuration, lifting druid.indexer.fork.property.
configurations directly to the Indexer, and sizing heap and direct memory based on the Peon sizes multiplied by the number of task slots (unlike a MiddleManager, it does not accept the configurations druid.indexer.runner.javaOpts
or druid.indexer.runner.javaOptsArray
). See the indexer documentation for details.
# Native parallel batch indexing with shuffle
In 0.16.0, Druid's index_parallel
native parallel batch indexing task now supports 'perfect' rollup with the implementation of a 2 stage shuffle process.
Tasks in stage 1 perform a secondary partitioning of rows on top of the standard time based partitioning of segment granularity, creating an intermediary data segment for each partition. Stage 2 tasks are each assigned a set of the partitionings created during stage 1, and will collect and combine the set of intermediary data segments which belong to that partitioning, allowing it to achieve complete rollup when building the final segments. At this time, only hash-based partitioning is supported.
This can be enabled by setting forceGuaranteedRollup
to true
in the tuningConfig
; numShards
in partitionsSpec
and intervals
in granularitySpec
must also be set.
The Druid MiddleManager (or the new Indexer) processes have a new responsibility for these indexing tasks, serving the intermediary partition segments output of stage 1 into the stage 2 tasks, so depending on configuration and cluster size, the MiddleManager jvm configuration might need to be adjusted to increase heap allocation and http threads. These numbers are expected to scale with cluster size, as all MiddleManager or Indexer processes involved in a shuffle will need the ability to communicate with each other, but we do not expect the footprint to be significantly larger than it is currently. Optimistically we suggest trying with your existing configurations, and bumping up heap and http thread count only if issues are encountered.
#...
druid-0.15.1-incubating
Apache Druid 0.15.1-incubating is a bug fix release that includes important fixes for Apache Zookeeper based segment loading, the 'druid-datasketches' extension, and much more.
Bug Fixes
Coordinator
#8137 coordinator throwing exception trying to load segments (fixed by #8140)
Middlemanager
#7886 Middlemanager fails startup due to corrupt task files (fixed by #7917)
#8085 fix forking task runner task shutdown to be more graceful
Queries
#7777 timestamp_ceil function is either wrong or misleading (fixed by #7823)
#7820 subtotalsSpec and filtering returns no results (fixed by #7827)
#8013 Fix ExpressionVirtualColumn capabilities; fix groupBy's improper uses of StorageAdapter#getColumnCapabilities.
API
#6786 apache-druid-0.13.0-incubating router /druid/router/v1/brokers (fixed by #8026)
#8044 SupervisorManager: Add authorization checks to bulk endpoints.
Metrics Emitters
#8204 HttpPostEmitter throw Class cast exception when using emitAndReturnBatch (fixed by #8205)
Extensions
Datasketches
#7666 sketches-core-0.13.4
#8055 force native order when wrapping ByteBuffer
Kinesis Indexing Service
#7830 Kinesis: Fix getPartitionIds, should be checking isHasMoreShards.
Moving Average Query
#7999 Druid moving average query results in circular reference error (fixed by #8192)
Documentation Fixes
#8002 Improve pull-deps reference in extensions page.
#8003 Add missing reference to Materialized-View extension.
#8079 Fix documentation formatting
#8087 fix references to bin/supervise in tutorial docs
Updating from 0.15.0-incubating and earlier
Due to issue #8137, when updating from specifically 0.15.0-incubating to 0.15.1-incubating, it is recommended to update the Coordinator before the Historical servers to prevent segment unavailability during an upgrade (this is typically reversed). Upgrading from any version older than 0.15.0-incubating does not have these concerns and can be done normally.
Known Issues
Building Docker images is currently broken and will be fixed in the next release, see #8054 which is fixed by #8237 for more details.
Credits
Thanks to everyone who contributed to this release!
@AlexanderSaydakov
@ArtyomyuS
@ccl0326
@clintropolis
@gianm
@himanshug
@jihoonson
@legoscia
@leventov
@pjain1
@yurmix
@xueyumusic
Apache Druid (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
druid-0.15.0-incubating
Apache Druid 0.15.0-incubating contains over 250 new features, performance/stability/documentation improvements, and bug fixes from 39 contributors. Major new features and improvements include:
- New Data Loader UI
- Support transactional Kafka topic
- New Moving Average query
- Time ordering for Scan query
- New Moments Sketch aggregator
- SQL enhancements
- Light lookup module for routers
- Core ORC extension
- Core GCP extension
- Document improvements
The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Aclosed+milestone%3A0.15.0
Documentation for this release is at: http://druid.apache.org/docs/0.15.0-incubating/
Highlights
New Data Loader UI (Batch indexing part)
Druid has a new Data Loader UI which is integrated with the Druid Console. The new Data Loader UI shows some sampled data to easily verify the ingestion spec and generates the final ingestion spec automatically. The users are expected to easily issue batch index tasks instead of writing a JSON spec by themselves.
Added by @vogievetsky and @dclim in #7572 and #7531, respectively.
Support Kafka Transactional Topics
The Kafka indexing service now supports Kafka Transactional Topics.
Please note that only Kafka 0.11.0 or later versions are supported after this change.
Added by @surekhasaharan in #6496.
New Moving Average Query
A new query type was introduced to compute moving average.
Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/moving-average-query.html for more details.
Time Ordering for Scan Query
The Scan query type now supports time ordering. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/scan-query.html#time-ordering for more details.
Added by @justinborromeo in #7133.
New Moments Sketch Aggregator
The Moments Sketch is a new sketch type for approximate quantile computation. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-contrib/momentsketch-quantiles.html for more details.
SQL enhancements
Druid community has been striving to enhance SQL support and now it's no longer experimental.
New SQL functions
- LPAD and RPAD functions were added by @xueyumusic in #7388.
- DEGREES and RADIANS functions were added by @xueyumusic in #7336.
- STRING_FORMAT function was added by @gianm in #7327.
- PARSE_LONG function was added by @gianm in #7326.
- ROUND function was added by @gianm in #7224.
- Trigonometric functions were added by @xueyumusic in #7182.
Autocomplete in Druid Console
Druid Console now supports autocomplete for SQL.
Time-ordered scan support for SQL
Druid SQL supports time-ordered scan query.
Added by @justinborromeo in #7373.
Lookups view added to the web console
You can now configure your lookups from the web console directly.
Misc web console improvements
"NoSQL" mode : #7493 [@shuqi7]
The web console now has a backup mode that allows it to function as best as it can if DruidSQL is disabled or unavailable.
Added compaction configuration dialog : #7242 [@shuqi7]
You can now configure the auto compaction settings for a data source from the Datasource view.
Auto wrap query with limit : #7449 [@vogievetsky]
The console query view will now (by default) wrap DruidSQL queries with a SELECT * FROM (...) LIMIT 1000
allowing you to enter queries like SELECT * FROM your_table
without worrying about the impact to the cluster. You can still send 'raw' queries by selecting the option from the ...
menu.
SQL explain query : #7402 [@shuqi7]
You can now click on the ...
menu in the query view to get an explanation of the DruidSQL query.
Surface is_overshadowed
as a column in the segments table #7555 , #7425 [@shuqi7][@surekhasaharan]
is_overshadowed
column represents that this segment is overshadowed by any published segments. It can be useful to see what segments should be loaded by historicals. Please see http://druid.apache.org/docs/0.15.0-incubating/querying/sql.html for more details.
Improved status UI for actions on tasks, supervisors, and datasources : #7528 [shuqi7]
This PR condenses the actions list into a tidy menu and lets you see the detailed status for supervisors and tasks. New actions for datasources around loading and dropping data by interval has also been added.
Light Lookup Module for Routers
Light lookup module was introduced for Routers and they now need only minimum amount of memory. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html#router for basic memory tuning.
Added by @clintropolis in #7222.
Core ORC extension
ORC extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the ORC extension in an earlier version of Druid.
Added by @clintropolis in #7138.
Core GCP extension
GCP extension is now promoted to a core extension. Please read the below 'Updating from 0.14.0-incubating and earlier' section if you are using the GCP extension in an earlier version of Druid.
Added by @drcrallen in #6953.
Document Improvements
Single-machine deployment example configurations and scripts
Several configurations and scripts were added for easy single machine setup. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/single-server.html for details.
Tool for migrating from local deep storage/Derby metadata
A new tool was added for easy migration from single machine to a cluster environment. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/deep-storage-migration.html for details.
Document for basic tuning guide
Documents for basic tuning guide was added. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/basic-cluster-tuning.html for details.
Security Improvement
The Druid system table now requires only mandatory permissions instead of the read permission for the whole sys
database. Please see http://druid.apache.org/docs/0.15.0-incubating/development/extensions-core/druid-basic-security.html for details.
Deprecated/removed
Drop support for automatic segment merge
The automatic segment merge by the coordinator is not supported anymore. Please use auto compaction instead.
Added by @jihoonson in #6883.
Drop support for insert-segment-to-db
tool
In Druid 0.14.x or earlier, Druid stores segment metadata (descriptor.json
file) in deep storage in addition to metadata store. This behavior has changed in 0.15.0 and it doesn't store segment metadata file in deep storage anymore. As a result, insert-segment-to-db
tool is no longer supported as well since it works based on descriptor.json
files in deep storage. Please see http://druid.apache.org/docs/0.15.0-incubating/operations/insert-segment-db.html for details.
Please note that kill task will fail if you're using HDFS as deep storage and descriptor.json
file is missing in 0.14.x or earlier versions.
Added by @jihoonson in #6911.
Removed "useFallback" configuration for SQL
This option was removed since it generates unscalable query plans and doesn't work with some SQL functions.
Removed a public API in `Co...
druid-0.14.2-incubating
Apache Druid 0.14.2-incubating is a bug fix release that includes important fixes for the 'druid-datasketches' extension and the broker 'result' level caching.
Bug Fixes
- #7607 thetaSketch(with sketches-core-0.13.1) in groupBy always return value no more than 16384
- #6483 Exception during sketch aggregations while using Result level cache
- #7621 NPE when both populateResultLevelCache and grandTotal are set
Credits
Thanks to everyone who contributed to this release!
@AlexanderSaydakov
@clintropolis
@jihoonson
@jon-wei
Apache Druid (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
druid-0.14.1-incubating
Apache Druid 0.14.1-incubating is a small patch release that includes a handful of bug and documentation fixes from 16 contributors.
Important Notice
This release fixes an issue with druid-datasketches
extension with quantile sketches, but introduces another one with theta sketches that was confirmed after the release was finalized, caused by #7320 and described in #7607. If you utilize theta sketches, we recommend not upgrading to this release. This will be fixed in the next release of Druid by #7619.
Bug Fixes
- use latest sketches-core-0.13.1 #7320
- Adjust BufferAggregator.get() impls to return copies #7464
- DoublesSketchComplexMetricSerde: Handle empty strings. #7429
- handle empty sketches #7526
- Adds backwards-compatible serde for SeekableStreamStartSequenceNumbers. #7512
- Support Kafka supervisor adopting running tasks between versions #7212
- Fix time-extraction topN with non-STRING outputType. #7257
- Fix two issues with Coordinator -> Overlord communication. #7412
- refactor druid-bloom-filter aggregators #7496
- Fix encoded taskId check in chatHandlerResource #7520
- Fix too many dentry cache slab objs#7508. #7509
- Fix result-level cache for queries #7325
- Fix flattening Avro Maps with Utf8 keys #7258
- Write null byte when indexing numeric dimensions with Hadoop #7020
- Batch hadoop ingestion job doesn't work correctly with custom segments table #7492
- Fix aggregatorFactory meta merge exception #7504
Documentation Changes
- Fix broken link due to Typo. #7513
- Some docs optimization #6890
- Updated Javascript Affinity config docs #7441
- fix expressions docs operator table #7420
- Fix conflicting information in configuration doc #7299
- Add missing doc link for operations/http-compression.html #7110
Updating from 0.14.0-incubating and earlier
Kafka Ingestion
Updating from version 0.13.0-incubating or earlier directly to 0.14.1-incubating will not require downtime like the migration path to 0.14.0-incubating due to the issue described in #6958, which has been fixed for this release in #7212. Likewise, rolling updates from version 0.13.0-incubating and earlier should also work properly due to #7512.
Native Parallel Ingestion
Updating from 0.13.0-incubating directly to 0.14.1-incubating will not encounter any issues during a rolling update with mixed versions of middle managers due to the fixes in #7520, as could be experienced when updating to 0.14.0-incubating.
Credits
Thanks to everyone who contributed to this release!
@AlexanderSaydakov
@b-slim
@benhopp
@chrishardis
@clintropolis
@ferristseng
@es1220
@gianm
@jihoonson
@jon-wei
@justinborromeo
@kaka11chen
@samarthjain
@surekhasaharan
@zhaojiandong
@zhztheplayer
Apache Druid (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
druid-0.14.0-incubating-rc3
[maven-release-plugin] prepare release druid-0.14.0-incubating
druid-0.14.0-incubating
Apache Druid (incubating) 0.14.0-incubating contains over 200 new features, performance/stability/documentation improvements, and bug fixes from 54 contributors. Major new features and improvements include:
- New web console
- Amazon Kinesis indexing service
- Decommissioning mode for Historicals
- Published segment cache in Broker
- Bloom filter aggregator and expression
- Updated Apache Parquet extension
- Force push down option for nested GroupBy queries
- Better segment handoff and drop rule handling
- Automatically kill MapReduce jobs when Apache Hadoop ingestion tasks are killed
- DogStatsD tag support for statsd emitter
- New API for retrieving all lookup specs
- New compaction options
- More efficient cachingCost segment balancing strategy
The full list of changes is here: https://github.com/apache/incubator-druid/pulls?q=is%3Apr+is%3Amerged+milestone%3A0.14.0
Documentation for this release is at: http://druid.io/docs/0.14.0-incubating/
Highlights
New web console
Druid has a new web console that provides functionality that was previously split between the coordinator and overlord consoles.
The new console allows the user to manage datasources, segments, tasks, data processes (Historicals and MiddleManagers), and coordinator dynamic configuration. The user can also run SQL and native Druid queries within the console.
For more details, please see http://druid.io/docs/0.14.0-incubating/operations/management-uis.html
Added by @vogievetsky in #6923.
Kinesis indexing service
Druid now supports ingestion from Kinesis streams, provided by the new druid-kinesis-indexing-service
core extension.
Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/kinesis-ingestion.html for details.
Decommissioning mode for Historicals
Historical processes can now be put into a "decommissioning" mode, where the coordinator will no longer consider the Historical process as a target for segment replication. The coordinator will also move segments off the decommissioning Historical.
This is controlled via Coordinator dynamic configuration. For more details, please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#dynamic-configuration.
Added by @egor-ryashin in #6349.
Published segment cache on Broker
The Druid Broker now has the ability to maintain a cache of published segments via polling the Coordinator, which can significantly improve response time for metadata queries on the sys.segments
system table.
Please see http://druid.io/docs/0.14.0-incubating/querying/sql.html#retrieving-metadata for details.
Added by @surekhasaharan in #6901
Bloom filter aggregator and expression
A new aggregator for constructing Bloom filters at query time and support for performing Bloom filter checks within Druid expressions have been added to the druid-bloom-filter
extension.
Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/bloom-filter.html
Added by @clintropolis in #6904 and #6397
Updated Parquet extension
druid-extensions-parquet
has been moved into the core extension set from the contrib extensions and now supports flattening and int96 values.
Please see http://druid.io/docs/0.14.0-incubating/development/extensions-core/parquet.html for details.
Added by @clintropolis in #6360
Force push down option for nested GroupBy queries
Outer query execution for nested GroupBy queries can now be pushed down to Historical processes; previously, the outer queries would always be executed on the Broker.
Please see #5471 for details.
Added by @samarthjain in #5471.
Better segment handoff and retention rule handling
Segment handoff will now ignore segments that would be dropped by a datasource's retention rules, avoiding ingestion failures caused by issue #5868.
Period load rules will now include the future by default.
A new "Period Drop Before" rule has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/rule-configuration.html#period-drop-before-rule for details.
Added by @QiuMM in #6676, #6414, and #6415.
Automatically kill MapReduce jobs when Hadoop ingestion tasks are killed
Druid will now automatically terminate MapReduce jobs created by Hadoop batch ingestion tasks when the ingestion task is killed.
Added by @ankit0811 in #6828.
DogStatsD tag support for statsd-emitter
The statsd-emitter
extension now supports DogStatsD-style tags. Please see http://druid.io/docs/0.14.0-incubating/development/extensions-contrib/statsd.html
Added by @deiwin in #6605, with support for constant tags added by @glasser in #6791.
New API for retrieving all lookup specs
A new API for retrieving all lookup specs for all tiers has been added. Please see http://druid.io/docs/0.14.0-incubating/querying/lookups.html#get-all-lookups for details.
Added by @jihoonson in #7025.
New compaction options
Auto-compaction now supports the maxRowsPerSegment
option. Please see http://druid.io/docs/0.14.0-incubating/design/coordinator.html#compacting-segments for details.
The compaction task now supports a new segmentGranularity
option, deprecating the older keepSegmentGranularity
option for controlling the segment granularity of compacted segments. Please see the segmentGranularity
table in http://druid.io/docs/0.14.0-incubating/ingestion/compaction.html for more information on these properties.
Added by @jihoonson in #6758 and #6780.
More efficient cachingCost segment balancing strategy
The cachingCost
Coordinator segment balancing strategy will now only consider Historical processes for balancing decisions. Previously the strategy would unnecessarily consider active worker tasks as well, which are not targets for segment replication.
New metrics:
- New allocation rate metric
jvm/heapAlloc/bytes
, added by @egor-ryashin in #6710. - New query count metric
query/count
, added by @QiuMM in #6473. - SQL query metrics
sqlQuery/bytes
andsqlQuery/time
, added by @gaodayue in #6302. - Apache Kafka ingestion lag metrics
ingest/kafka/maxLag
andingest/kafka/avgLag
, added by @QiuMM in #6587 - Task count metrics
task/success/count
,task/failed/count
,task/running/count
,task/pending/count
,task/waiting/count
, added by @QiuMM in #6657
New interfaces for extension developers
RequestLogEvent
It is now possible to control the fields in RequestLogEvent
, emitted by EmittingRequestLogger
. Please see #6477 for details. Added by @leventov.
Custom TLS certificate checks
An extension point for custom TLS certificate checks has been added. Please see http://druid.io/docs/0.14.0-incubating/operations/tls-support.html#custom-tls-certificate-checks for details. Added by @jon-wei in #6432.
Kafka Indexing Service no longer experimental
The Kafka Indexing Service extension has been moved out of experimental status.
SQL Enhancements
Enhancements to dsql
The dsql
command line client now supports CLI history, basic autocomplete, and specifying query timeouts in the query context.
Add SQL id, request logs, and metrics
SQL queries now have an ID, and native queries executed as part of a SQL query will have the associated SQL query ID in the native query's request logs. SQL queries will now be logged in the request logs.
Two new metrics, sqlQuery/time
and sqlQuery/bytes
, are now emitted for SQL queries.
Please see http://druid.io/docs/0.14.0-incubating/configuration/index.html#request-logging and http://druid.io/docs/0.14.0-incubating/querying/sql.html#sql-metrics for details.
More SQL aggregator support
The follow aggregators are now supported in SQL:
- DataSketches HLL sketch
- DataSketches Theta sketch
- DataSketches quantiles sketch
- Fixed bins histogram
- Bloom filter aggregator
Added by @jon-wei in #6951 and @clintropolis in #6502
Other SQL enhancements
- SQL: Add support for queries with project-after-semijoin. #6756
- SQL: Support for selecting multi-value dimensions. #6462
- SQL: Support AVG on system tables. #601
- SQL: Add "POSITION" function. #6596
- SQL: Set INFORMATION_SCHEMA catalog name to "druid". #6595
- SQL: Fix ordering of sort, sortProject in DruidSemiJoin. #6769
Added by @gianm.
Updating from 0.13.0-incubating and earlier
Kafka ingestion downtime when upgrading
Due to the issue described in #6958, existing Kafka indexing tasks can be terminated unnecessarily during a rolling upgrade of the Overlord. The terminated tasks will be restarted by the Overlord and will function correctly after the initial restart.
Parquet extension changes
The druid-parquet-extensions
extension has been moved from contrib
to core
. When deploying 0.14.0-incubating, please ensure that your extensions-contrib
directory does not have any older versions of the Parquet extension.
Additionally, there are now two styles of Parquet parsers in the extension:
parquet-avro
: Converts Parquet to Avro, and then parses the Avro representation. This was the existing parser prior to 0.14.0-incubating.parquet
: A new parser that parses the Parquet format directly. Only this new parser supports int96 values.
Prior to 0.14.0-incubating, a specifying a parquet
type parser would have a task use the Avro-converting parser. In 0.14.0-incubating, to continue using the Avro-converting parser, you will need to update your ingestion specs to use parquet-avro
instead.
The inputFormat
field in the inputSpec
for tasks using Parquet ...