-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Gap auto fill scheduler UI and API #244719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… backfill-iniator
… src/core/server/integration_tests/ci_checks'
… backfill-iniator
… backfill-iniator
…to gap-auto-fill-task
…to gap-auto-fill-task
…to gap-auto-fill-task
|
@elasticmachine merge upstream |
|
@elasticmachine merge upstream |
|
@elasticmachine merge upstream |
|
|
||
| return ( | ||
| <> | ||
| {isOpen && ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is now, this component is always rendering when the settings modal opens, even if the user never opens the flyout by clicking the logs link. As a result, it is making 4 unnecessary post requests to fetch logs.
A solution for this could be to move away the hooks that fetch the logs to their own component and have an outer component GapAutoFillLogsFlyout which decides whether to render the component that calls the hooks when open, or render nothing when closed.
|
I have found a strange behavior on a fresh installation. If I open the settings and click on save, without activating the scheduler, it activates automatically (see the video below). My expectation as a user would be that if I do not change any settings, they remain the same when I save/cancel/close the modal. Screen.Recording.2025-12-03.at.13.03.12.mov |
| <EuiSpacer size="m" /> | ||
|
|
||
| <EuiFormRow> | ||
| <EuiSwitch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| const uniqueRuleTypeIds = new Set(updatedRuleTypes.map(({ type }) => type)); | ||
|
|
||
| for (const ruleTypeId of uniqueRuleTypeIds) { | ||
| context.ruleTypeRegistry.get(ruleTypeId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not doing anything with the result of this line. Is this meant to validate that the rule types are valid?
| map.set(toRuleTypeKey(ruleType), ruleType); | ||
| } | ||
| if (incoming) { | ||
| for (const ruleType of incoming) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having two for blocks doing the same thing, consider merging both arrays at the beginning.
…bana into gap-auto-fill-task-api-ui
|
@denar50 thanks for those comments, UI suggestion very good. I addressed them all |
| if (!gapAutoFillScheduler) { | ||
| await createMutation.mutateAsync(); | ||
| } else { | ||
| await updateMutation.mutateAsync({ ...gapAutoFillScheduler, enabled }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're always updating even when the user doesn't change anything. I took a look at the request itself and it is a bit expensive, currently taking almost 2 seconds when the rule is enabled (with the call to taskManager.ensureScheduled taking most of the time). Consider either preventing that call here or, even better, short circuiting in the backend to avoid doing unnecessary work.
|
|
||
| await taskManager.removeIfExists(scheduledTaskId); | ||
|
|
||
| await soClient.delete(GAP_AUTO_FILL_SCHEDULER_SAVED_OBJECT_TYPE, params.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that removing the saved object is the point of no return here. It might be worth to do it after deleting the backfills as the last step.
|
|
||
| beforeEach(() => { | ||
| jest.resetAllMocks(); | ||
| rulesClient = new RulesClient({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bunch of code repeats across the test files of the CRUD logic. Consider moving this initialization logic to one place within the ... gap_auto_fill_scheduler/methods folder. You can have a function that builds the rules client along with all the other clients you initialize in lines 30-37.
|
|
||
| // Authorization check - we need to check if user has permission to get | ||
| // For gap fill auto scheduler, we check against the rule types it manages | ||
| const ruleTypes = result.attributes.ruleTypes || []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be falsy?
|
|
||
| // Authorization check - we need to check if user has permission to get logs | ||
| // For gap fill auto scheduler, we check against the rule types it manages | ||
| const ruleTypes = schedulerSO.attributes.ruleTypes || []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, why would this be falsy?
| import type { IValidatedEventInternalDocInfo } from '@kbn/event-log-plugin/server'; | ||
| import type { GapAutoFillSchedulerLogEntry } from './types'; | ||
|
|
||
| export const formatGapAutoFillSchedulerLogEntry = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a unit test for this utils function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to add a test where you have an autofill schedule with 3 rule types, and the update request has 1 of those rule types. The functionality still makes sure that all the rule types in the original SO, and not just the one in the parameters, gets updated by making a union of the types.
💔 Build Failed
Failed CI StepsTest Failures
Metrics [docs]Module Count
Public APIs missing comments
Async chunks
Public APIs missing exports
Page load bundle
History
cc @nkhristinin |
| } | ||
| } | ||
|
|
||
| public ensureLicenseForGapAutoFillScheduler() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update unit tests
| }); | ||
| }); | ||
|
|
||
| test('ensures the license allows for getting gap fill auto scheduler', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should update tests to include the custom license check you added
| name: schema.string(), | ||
| enabled: schema.boolean(), | ||
| gap_fill_range: schema.string(), | ||
| max_backfills: schema.number({ min: 1, max: 5000 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be
| max_backfills: schema.number({ min: 1, max: 5000 }), | |
| max_backfills: schema.number(maxBackfills), |
| enabled: schema.boolean(), | ||
| gap_fill_range: schema.string(), | ||
| max_backfills: schema.number({ min: 1, max: 5000 }), | ||
| num_retries: schema.number({ min: 1 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be
| num_retries: schema.number({ min: 1 }), | |
| num_retries: schema.number(numRetries), |
| expect(licenseState.ensureLicenseForGapAutoFillScheduler).toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| test('respects license failures', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should update test for ensureLicenseForGapAutoFillScheduler
|
|
||
| try { | ||
| // Get the scheduler saved object to access ruleTypes for authorization | ||
| const schedulerSO = await context.unsecuredSavedObjectsClient.get<GapAutoFillSchedulerSO>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This GET logic is repeated in multiple places. It would be useful to create a library function that encapsulates getting the SO, checking for errors and performing the authorization check and reusing that in all of these functions.
| await taskManager.removeIfExists(updatedSo.id); | ||
|
|
||
| if (updatedEnabled) { | ||
| await taskManager.ensureScheduled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be working on a bug fix to update TM tasks with API keys: #244918. When that is done, you should just be able to update the schedule of the existing task and not have to reschedule it.
| id: updatedSo.id, | ||
| taskType: GAP_AUTO_FILL_SCHEDULER_TASK_TYPE, | ||
| schedule: updatedSchedule, | ||
| scope: updatedScope ?? [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the task scope really matter? The scope is stored inside the saved object already. If we don't have to update the task scope, you should be able to just use the existing updateSchedule functions that task manager provides (once the above mentioned bug is fixed).
| internalSavedObjectsRepository, | ||
| eventLogClient, | ||
| eventLogger, | ||
| actionsClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new fields are not used for deleteBackfillForRules so should not be part of the method definition because it makes it confusing. They're all optional fields anyway, so you can exclude them when calling deleteAdHocRunsAndTasks from this function.
| if (adHocRuns.length === 0) return; | ||
|
|
||
| // Prepare backfill metadata for gap updates before deleting SOs | ||
| const backfillsForGapUpdate = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest creating a const like const canUpdateGaps = shouldUpdateGaps && actionsClient && internalSavedObjectsRepository && eventLogClient and reusing.

Gap auto fill scheduler UI and API
Overview
This PR introduces the UI and APIs for the Gap auto fill scheduler.
It completes the full CRUD API (Read, Update, Delete) for the gap fill scheduler configuration.
A new Rules Settings modal and Gap scheduler logs flyout are introduced in the rule page to provide users with a purpose to enable/disable scheduler and see the logs.
What is Gap auto fill scheduler?
Rule can have gaps - period of time when rule didn't execute.
To resolve a gap, we can initiate a manual rule run (backfill) for the specific rule and time range. It will execute rule and then update gaps (make them "filled")
The Gap auto fill scheduler automates this. It runs on a defined schedule, finds rules that have gaps, and automatically schedules the backfill runs to fill them.
The feature is gated by a Enterprise+ license.



New APIs
Get Gap Auto Fill Scheduler
GET /internal/alerting/rules/gaps/auto_fill_scheduler/{id}Update Gap Auto Fill Scheduler
PUT /internal/alerting/rules/gaps/auto_fill_scheduler/{id}Delete Gap Auto Fill Scheduler
DELETE /internal/alerting/rules/gaps/auto_fill_scheduler/{id}Get Gap Auto Fill Scheduler Execution Logs
POST /internal/alerting/rules/gaps/auto_fill_scheduler/{id}/logsCleanup
Update (when disable) and deletion API handles cleanup of the associated task manager task and delete any backfills created by this scheduler.
Event Log Statuses
We introduce a new status in the scheduler execution log flow.
statusfield. This status is used when the scheduler successfully runs but finds no eligible rules with gaps to process.UI Changes
New Rules Settings Modal
Gap auto fill schedulerconfiguration (enable/disable now).Gap scheduler logs flyout
How to Test
Feature Flag
The UI and API endpoints are controlled by the following feature flag, which must be enabled in
kibana.dev.yml:There are two ways to create gaps:
Manual method:
Create and enable a security rule with a 1-minute interval and 0-second lookback.
After the first run, disable the rule, wait 5 minutes, and then enable it again you should execution error about gaps, and see the gap in the gaps table in the execution tab.
Using the this tool:
Run the following command to generate multiple rules and gaps (100 rules, 10 gaps each, 30m interval rule, and remove all rules before):