Skip to content

Conversation

@DanielZhu58
Copy link
Contributor

@DanielZhu58 DanielZhu58 commented Nov 18, 2025

What changes were proposed in this pull request?

To add a new StatisticsManagementTask.java to automatically delete the old stats.

Why are the changes needed?

To help reduce the old or stale stats.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual tests and unit tests.

For reviewers: What this PR does

This PR introduces a new “Statistics Management Task” in the Hive metastore which periodically auto-deletes stale column statistics, plus configuration knobs.

In MetastoreConf.java
Three new configuration variables are added:
STATISTICS_MANAGEMENT_TASK_FREQUENCY
Meaning: Controls how often the StatisticsManagementTask runs, for tables that have statistics.auto.deletion=true in their table properties.
STATISTICS_RETENTION_PERIOD
Meaning: The retention period for stats. If a table/partition’s stats are older than this, they become candidates for auto deletion.
STATISTICS_AUTO_DELETION

In StatisticsManagementTask.java
Defines a new StatisticsManagementTask implementing MetastoreTaskThread. Its purpose is to:
Fetch STATISTICS_RETENTION_PERIOD and STATISTICS_AUTO_DELETION from conf. If retention <= 0 or auto deletion is disabled, log and return.
Compute lastAnalyzedThreshold = (now - retentionMillis) / 1000 (in seconds).
Use HMSHandler.getMSForConf(conf) to get RawStore and a PersistenceManager, then query MTableColumnStatistics rows where lastAnalyzed < threshold.
In short, this class implements a background cleanup task that scans MTableColumnStatistics for stale entries and deletes them via the metastore client.

In BenchmarkTool.java
BenchmarkTool can now benchmark the new statistics management task for different numbers of tables.

In HMSBenchmarks.java Test
Constructs a dedicated database name and table prefix based on tableCount and BenchData.
Gets an HMSClient and instantiates a StatisticsManagementTask.
Configures the client Hadoop conf:
hive.metastore.uris = metastore URI
metastore.statistics.management.database.pattern = dbName (so the task focuses on this DB)
Sets the task’s conf and creates the database and tableCount tables:
Simulates old stats:
For each partition, sets lastAnalyzed to now - 400 days in the partition parameters and alters the partition.
Post-run assertion:
Re-scans all partitions; if any partition parameters still contain lastAnalyzed, it throws an AssertionError("Partition stats not deleted for table: " + tableName).
In other words, this is an end-to-end microbenchmark for the new StatisticsManagementTask that both measures performance and verifies that “old” partition stats are actually cleaned up.

deniskuzZ and others added 25 commits January 12, 2026 14:33
1. Drop CanAggregateDistinct and refactor dependent code accordingly.
2. Remove isDistinct indicator from all classes extending SqlAggFunction.
3. Move the part handling window functions from SqlFunctionConverter#buildAST to ASTConverter
4. Generalize the generation of TOK_FUNCTIONSTAR for aggregate functions by exploiting SqlOperator#getSqlSyntax
5. Replace CalciteUDAF with SqlBasicAggFunction.create since the former does not bring any additional info (operandTypeInference is removed but it is not used anyways from Hive).
…eadable format (apache#6230)

1. Add new property to control indentation of EXPLAIN FORMATTED result
2. Create the appropriate JsonParser in ExplainTask based on explain configurations
3. Drop now unused and redundant JsonParserFactory
4. Extract logic for augmenting RS outputs in separate method dedicated for this purpose
…sRead metrics for tables with multiple partitions (apache#6253)
…pache#6155)

* HIVE-29293: Restrict config 'mapreduce.job.queuename' at tez session

* Address review comments

* Address test failures

* Address review comments

* indentation issue
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.