Skip to content

Conversation

@nkhristinin
Copy link
Contributor

@nkhristinin nkhristinin commented Dec 1, 2025

Gap auto fill scheduler UI and API

Overview

This PR introduces the UI and APIs for the Gap auto fill scheduler.

It completes the full CRUD API (Read, Update, Delete) for the gap fill scheduler configuration.

A new Rules Settings modal and Gap scheduler logs flyout are introduced in the rule page to provide users with a purpose to enable/disable scheduler and see the logs.

What is Gap auto fill scheduler?

Rule can have gaps - period of time when rule didn't execute.

To resolve a gap, we can initiate a manual rule run (backfill) for the specific rule and time range. It will execute rule and then update gaps (make them "filled")

The Gap auto fill scheduler automates this. It runs on a defined schedule, finds rules that have gaps, and automatically schedules the backfill runs to fill them.

The feature is gated by a Enterprise+ license.
Screenshot 2025-12-02 at 12 34 25
Screenshot 2025-12-02 at 12 34 20
Screenshot 2025-12-02 at 12 34 11


New APIs

Get Gap Auto Fill Scheduler

  • Endpoint:
    GET /internal/alerting/rules/gaps/auto_fill_scheduler/{id}

Update Gap Auto Fill Scheduler

  • Endpoint:
    PUT /internal/alerting/rules/gaps/auto_fill_scheduler/{id}

Delete Gap Auto Fill Scheduler

  • Endpoint:
    DELETE /internal/alerting/rules/gaps/auto_fill_scheduler/{id}

Get Gap Auto Fill Scheduler Execution Logs

  • Endpoint:
    POST /internal/alerting/rules/gaps/auto_fill_scheduler/{id}/logs

Cleanup

Update (when disable) and deletion API handles cleanup of the associated task manager task and delete any backfills created by this scheduler.

Event Log Statuses

We introduce a new status in the scheduler execution log flow.

  • NO_GAPS: Added to the status field. This status is used when the scheduler successfully runs but finds no eligible rules with gaps to process.

UI Changes

New Rules Settings Modal

  • A "Rules Settings" button is added to the Rules page.
  • This modal allows users to manage the Gap auto fill scheduler configuration (enable/disable now).

Gap scheduler logs flyout

  • A "logs" link in the settings modal opens a flyout displaying the scheduler's execution logs from the Event Log.
  • The flyout provides details on run status, and log message.

How to Test

Feature Flag

The UI and API endpoints are controlled by the following feature flag, which must be enabled in kibana.dev.yml:

xpack.alerting.gapAutoFillScheduler.enabled: true
xpack.securitySolution.enableExperimental: [ 'gapAutoFillSchedulerEnabled']
  1. Ensure you have rules with gaps

There are two ways to create gaps:

  • Manual method:
    Create and enable a security rule with a 1-minute interval and 0-second lookback.
    After the first run, disable the rule, wait 5 minutes, and then enable it again you should execution error about gaps, and see the gap in the gaps table in the execution tab.

  • Using the this tool:
    Run the following command to generate multiple rules and gaps (100 rules, 10 gaps each, 30m interval rule, and remove all rules before):

    npm run start -- rules --rules 100 -c -g 10 -i "30m"
    
  1. Go to rules page
  2. Open rule settings modal window and enable gap auto fill scheduler.
  3. On rule monitoring table observe that are some rules are in progress for gap filling.
  4. From the rule settings modal window open the logs flyout and observe the logs

nkhristinin and others added 30 commits October 6, 2025 15:24
… src/core/server/integration_tests/ci_checks'
@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@denar50 denar50 self-requested a review December 2, 2025 16:45
@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream

@nkhristinin
Copy link
Contributor Author

@elasticmachine merge upstream


return (
<>
{isOpen && (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is now, this component is always rendering when the settings modal opens, even if the user never opens the flyout by clicking the logs link. As a result, it is making 4 unnecessary post requests to fetch logs.
A solution for this could be to move away the hooks that fetch the logs to their own component and have an outer component GapAutoFillLogsFlyout which decides whether to render the component that calls the hooks when open, or render nothing when closed.

@denar50
Copy link
Contributor

denar50 commented Dec 3, 2025

I have found a strange behavior on a fresh installation. If I open the settings and click on save, without activating the scheduler, it activates automatically (see the video below). My expectation as a user would be that if I do not change any settings, they remain the same when I save/cancel/close the modal.

Screen.Recording.2025-12-03.at.13.03.12.mov

<EuiSpacer size="m" />

<EuiFormRow>
<EuiSwitch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can interact with this toggle if the request to get the current gap auto fill configuration takes a long time. Consider disabling the toggle and the save button while that request loads.
image

const uniqueRuleTypeIds = new Set(updatedRuleTypes.map(({ type }) => type));

for (const ruleTypeId of uniqueRuleTypeIds) {
context.ruleTypeRegistry.get(ruleTypeId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not doing anything with the result of this line. Is this meant to validate that the rule types are valid?

map.set(toRuleTypeKey(ruleType), ruleType);
}
if (incoming) {
for (const ruleType of incoming) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having two for blocks doing the same thing, consider merging both arrays at the beginning.

@nkhristinin
Copy link
Contributor Author

@denar50 thanks for those comments, UI suggestion very good. I addressed them all

@nkhristinin nkhristinin requested a review from denar50 December 3, 2025 15:39
if (!gapAutoFillScheduler) {
await createMutation.mutateAsync();
} else {
await updateMutation.mutateAsync({ ...gapAutoFillScheduler, enabled });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're always updating even when the user doesn't change anything. I took a look at the request itself and it is a bit expensive, currently taking almost 2 seconds when the rule is enabled (with the call to taskManager.ensureScheduled taking most of the time). Consider either preventing that call here or, even better, short circuiting in the backend to avoid doing unnecessary work.


await taskManager.removeIfExists(scheduledTaskId);

await soClient.delete(GAP_AUTO_FILL_SCHEDULER_SAVED_OBJECT_TYPE, params.id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that removing the saved object is the point of no return here. It might be worth to do it after deleting the backfills as the last step.


beforeEach(() => {
jest.resetAllMocks();
rulesClient = new RulesClient({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bunch of code repeats across the test files of the CRUD logic. Consider moving this initialization logic to one place within the ... gap_auto_fill_scheduler/methods folder. You can have a function that builds the rules client along with all the other clients you initialize in lines 30-37.


// Authorization check - we need to check if user has permission to get
// For gap fill auto scheduler, we check against the rule types it manages
const ruleTypes = result.attributes.ruleTypes || [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be falsy?


// Authorization check - we need to check if user has permission to get logs
// For gap fill auto scheduler, we check against the rule types it manages
const ruleTypes = schedulerSO.attributes.ruleTypes || [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, why would this be falsy?

import type { IValidatedEventInternalDocInfo } from '@kbn/event-log-plugin/server';
import type { GapAutoFillSchedulerLogEntry } from './types';

export const formatGapAutoFillSchedulerLogEntry = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a unit test for this utils function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to add a test where you have an autofill schedule with 3 rule types, and the update request has 1 of those rule types. The functionality still makes sure that all the rule types in the original SO, and not just the one in the parameters, gets updated by making a union of the types.

@elasticmachine
Copy link
Contributor

elasticmachine commented Dec 3, 2025

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #80 / Maps endpoints apis search ES|QL should return getValues response in expected shape
  • [job] [logs] FTR Configs #80 / Maps endpoints apis search ES|QL should return getValues response in expected shape

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 8448 8454 +6

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 827 829 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 11.1MB 11.1MB +14.8KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 64 65 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alerting 23.9KB 24.0KB +102.0B
securitySolution 166.8KB 166.8KB +31.0B
total +133.0B
Unknown metric groups

API count

id before after diff
alerting 864 866 +2

History

cc @nkhristinin

}
}

public ensureLicenseForGapAutoFillScheduler() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update unit tests

});
});

test('ensures the license allows for getting gap fill auto scheduler', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update tests to include the custom license check you added

name: schema.string(),
enabled: schema.boolean(),
gap_fill_range: schema.string(),
max_backfills: schema.number({ min: 1, max: 5000 }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be

Suggested change
max_backfills: schema.number({ min: 1, max: 5000 }),
max_backfills: schema.number(maxBackfills),

enabled: schema.boolean(),
gap_fill_range: schema.string(),
max_backfills: schema.number({ min: 1, max: 5000 }),
num_retries: schema.number({ min: 1 }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be

Suggested change
num_retries: schema.number({ min: 1 }),
num_retries: schema.number(numRetries),

expect(licenseState.ensureLicenseForGapAutoFillScheduler).toHaveBeenCalled();
});

test('respects license failures', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update test for ensureLicenseForGapAutoFillScheduler


try {
// Get the scheduler saved object to access ruleTypes for authorization
const schedulerSO = await context.unsecuredSavedObjectsClient.get<GapAutoFillSchedulerSO>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This GET logic is repeated in multiple places. It would be useful to create a library function that encapsulates getting the SO, checking for errors and performing the authorization check and reusing that in all of these functions.

await taskManager.removeIfExists(updatedSo.id);

if (updatedEnabled) {
await taskManager.ensureScheduled(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be working on a bug fix to update TM tasks with API keys: #244918. When that is done, you should just be able to update the schedule of the existing task and not have to reschedule it.

id: updatedSo.id,
taskType: GAP_AUTO_FILL_SCHEDULER_TASK_TYPE,
schedule: updatedSchedule,
scope: updatedScope ?? [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the task scope really matter? The scope is stored inside the saved object already. If we don't have to update the task scope, you should be able to just use the existing updateSchedule functions that task manager provides (once the above mentioned bug is fixed).

internalSavedObjectsRepository,
eventLogClient,
eventLogger,
actionsClient,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new fields are not used for deleteBackfillForRules so should not be part of the method definition because it makes it confusing. They're all optional fields anyway, so you can exclude them when calling deleteAdHocRunsAndTasks from this function.

if (adHocRuns.length === 0) return;

// Prepare backfill metadata for gap updates before deleting SOs
const backfillsForGapUpdate =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest creating a const like const canUpdateGaps = shouldUpdateGaps && actionsClient && internalSavedObjectsRepository && eventLogClient and reusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:feature Makes this part of the condensed release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants