Add ability to enforce concurrent query limits #3257

lbschanno · 2025-10-31T18:14:28Z

NOTE: All tests pass during a build, but I have had issues getting queries to run correctly in quickstart, so further integration testing and fixes are needed.

Add the ability to enforce concurrent query limits across a group of webservers. Zookeeper is used to track active queries and the following data:

The query ID
The user who submitted the query
The system the query was submitted on
The query logic the query originated from

When the ActiveQueryTracker is instructed to track a query, the following nodes will be created in Zookeeper under the 'ActiveQueries' namespace:

/users/<userDn>/<queryId>
/systems/<systemName>/<queryId>
/queryLogics/<queryLogic>/<queryId>
/queries/<queryId>
/queries/<queryId>/user           [data = byte[] value of userDn]
/queries/<queryId>/system         [data = byte[] value of systemName]
/queries/<queryId>/queryLogic     [data = byte[] value of queryLogic]
/queries/<queryId>/heartbeats

This is done through the use of the ActiveQueryTracker class. In addition to managing the nodes that record information about the query, the ActiveQueryTracker class is also responsible for providing instances of the QueryHeartbeat class.

A QueryHeartbeat is a wrapper around an ephemeral PersistentNode, provided by the Apache Curator library. As long as this node is present in Zookeeper for a particular query, the query will be considered to be active. Should the webservers fail over and the Zookeeper connection drop, these heartbeat nodes will automatically be deleted by Zookeeper.

The ActiveQueryTracker is also responsible for providing instances of the ActiveQuerySnapshot class, which represent a snapshot of total active queries at a point in time that are associated with a particular user, system, or query logic.

Query limit enforcement is done through the QueryLimiter class. Given a user, system, and query logic, it can determine if any of the following limits have been exceeded:

The max allowed concurrent queries for the user.
The max allowed concurrent queries of the query logic for the user.
The max allowed concurrent queries for the system.
The max allowed concurrent queries of the query logic for the system.

Limits may be defined and customized on a per-user and per-system basis. They may also be defined for groups of query logics. The classes UserLimitProvider, SystemLimitProvider, and QueryLogicGroupLimitProvider are respectively responsible for identifying the best limits to enforce for a user, system, and query logic. They will be initialized in the QueryLimiter after providing a QueryLimitConfiguration instance. The following can be configured:

On a system-wide basis:

The default concurrent user query limit. This applies to the total number of queries a user may run across all systems. May be overridden per user.
The default concurrent system query limit. Primarily to avoid a system getting overloaded. May be overridden per system.
The default of whether queries submitted to a system are counted towards the user's concurrent query total. This is always true.

On a per-system basis:

The system name/ids the configuration targets. Regex matching is supported.
The concurrent system query limit. Overrides the system-wide value.
Whether queries submitted to the system count towards a user's concurrent query total. Overrides the system-wide value.
The concurrent system query limit for different query logic groups. Regex matching against group names is supported.

on a per-user basis:

The user DN.
The user's concurrent query limit. Overrides the system-wide configuration.
The user's concurrent query limit for different query logic groups. Regex matching against group names is supported.

On a per-query-logic-group basis:

The group name.
The query logics included in the group. Regex matching is supported.
The default concurrent user query limit. This applies to the total concurrent queries a user may run that originate from a query logic in the group across all systems.

Given the possibilities for exact matches, partial regex matches, and wildcard regex matches, the determination of the best limit to use for any particular system or query logic is done by sorting matches into the following 'matching buckets' (in best-match priority):

Exact match
Partial regex (non-wildcard-only)
Wildcard-only regex

and then selecting the lowest limit from the best bucket where we first found a match.

Currently the QueryLimiter is used in QueryExecutorBean, along with a QueryHeartbeatCache instance to cache heartbeats and keep them alive when a running query is cached for retrieval later. For the purposes of this feature, a query is considered to start when an Accumulo connection is retrieved from the connection factory, and is considered to end when the connection is returned to the factory.

The following error codes have been added:
412-20 - Concurrent query limit exceeded
500-164 - Error checking concurrent query limits

Closes #3100

Add the ability to enforce concurrent query limits across a group of webservers. Zookeeper is used to track active queries and the following data: - The query ID - The user who submitted the query - The system the query was submitted on - The query logic the query originated from When the `ActiveQueryTracker` is instructed to track a query, the following nodes will be created in Zookeeper under the 'ActiveQueries' namespace: ``` /users/<userDn>/<queryId> /systems/<systemName>/<queryId> /queryLogics/<queryLogic>/<queryId> /queries/<queryId> /queries/<queryId>/user [data = byte[] value of userDn] /queries/<queryId>/system [data = byte[] value of systemName] /queries/<queryId>/queryLogic [data = byte[] value of queryLogic] /queries/<queryId>/heartbeats ``` This is done through the use of the `ActiveQueryTracker` class. In addition to managing the nodes that record information about the query, the `ActiveQueryTracker` class is also responsible for providing instances of the `QueryHeartbeat` class. A `QueryHeartbeat` is a wrapper around an ephemeral PersistentNode, provided by the Apache Curator library. As long as this node is present in Zookeeper for a particular query, the query will be considered to be active. Should the webservers fail over and the Zookeeper connection drop, these heartbeat nodes will automatically be deleted by Zookeeper. The `ActiveQueryTracker` is also responsible for providing instances of the `ActiveQuerySnapshot` class, which represent a snapshot of total active queries at a point in time that are associated with a particular user, system, or query logic. Query limit enforcement is done through the `QueryLimiter` class. Given a user, system, and query logic, it can determine if any of the following limits have been exceeded: - The max allowed concurrent queries for the user. - The max allowed concurrent queries of the query logic for the user. - The max allowed concurrent queries for the system. - The max allowed concurrent queries of the query logic for the system. Limits may be defined and customized on a per-user and per-system basis. They may also be defined for groups of query logics. The classes `UserLimitProvider`, `SystemLimitProvider`, and `QueryLogicGroupLimitProvider` are respectively responsible for identifying the best limits to enforce for a user, system, and query logic. They will be initialized in the `QueryLimiter` after providing a `QueryLimitConfiguration` instance. The following can be configured: On a system-wide basis: - The default concurrent user query limit. This applies to the total number of queries a user may run across all systems. May be overridden per user. - The default concurrent system query limit. Primarily to avoid a system getting overloaded. May be overridden per system. - The default of whether queries submitted to a system are counted towards the user's concurrent query total. This is always true. On a per-system basis: - The system name/ids the configuration targets. Regex matching is supported. - The concurrent system query limit. Overrides the system-wide value. - Whether queries submitted to the system count towards a user's concurrent query total. Overrides the system-wide value. - The concurrent system query limit for different query logic groups. Regex matching against group names is supported. on a per-user basis: - The user DN. - The user's concurrent query limit. Overrides the system-wide configuration. - The user's concurrent query limit for different query logic groups. Regex matching against group names is supported. On a per-query-logic-group basis: - The group name. - The query logics included in the group. Regex matching is supported. - The default concurrent user query limit. This applies to the total concurrent queries a user may run that originate from a query logic in the group across all systems. Given the possibilities for exact matches, partial regex matches, and wildcard regex matches, the determination of the best limit to use for any particular system or query logic is done by sorting matches into the following 'matching buckets' (in best-match priority): 1. Exact match 2. Partial regex (non-wildcard-only) 3. Wildcard-only regex and then selecting the lowest limit from the best bucket where we first found a match. Currently the `QueryLimiter` is used in `QueryExecutorBean`, along with a `QueryHeartbeatCache` instance to cache heartbeats and keep them alive when a running query is cached for retrieval later. For the purposes of this feature, a query is considered to start when an Accumulo connection is retrieved from the connection factory, and is considered to end when the connection is returned to the factory. The following error codes have been added: 412-20 - Concurrent query limit exceeded 500-164 - Error checking concurrent query limits Closes #3100

lbschanno marked this pull request as draft October 31, 2025 18:15

lbschanno mentioned this pull request Oct 31, 2025

Limit the total number of queries a user can run concurrently in the system #3100

Open

lbschanno force-pushed the task/queryLimit branch from 156ed51 to 9c59e98 Compare October 31, 2025 18:33

lbschanno added 3 commits December 1, 2025 14:00

Tweak spring configurations

f91e18b

Merge branch 'integration' into task/queryLimit

5f3bf04

Fix issues with CDI

542ab09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ability to enforce concurrent query limits #3257

Add ability to enforce concurrent query limits #3257

Uh oh!

lbschanno commented Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add ability to enforce concurrent query limits #3257

Are you sure you want to change the base?

Add ability to enforce concurrent query limits #3257

Uh oh!

Conversation

lbschanno commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lbschanno commented Oct 31, 2025 •

edited

Loading