Skip to content

Conversation

@tomsonpl
Copy link
Contributor

@tomsonpl tomsonpl commented Aug 4, 2025

Summary

This PR introduces cancel action capabilities for Microsoft Defender Endpoint response actions in Kibana's Security Solution. Previously, there was no mechanism to cancel long-running or stuck response actions, creating operational challenges for security analysts managing Microsoft Defender Endpoint hosts.

Problem Statement

Security analysts working with Microsoft Defender Endpoint needed the ability to cancel pending or long-running response actions that may be stuck or no longer needed. Without this capability, they had to wait for actions to timeout or complete naturally, potentially impacting incident response times.

Solution

Core Authorization System

Implemented a dynamic, context-aware authorization system for cancel operations that evaluates permissions based on:

  • Base security solution access
  • Agent type compatibility (currently Microsoft Defender Endpoint only)
  • User's permission for the original command being cancelled
  • Feature flag enablement

Key Files:

  • common/endpoint/service/authz/cancel_authz_utils.ts - Authorization utility functions
    • isCancelFeatureAvailable() - Checks if cancel is available for agent type
    • canUserCancelCommand() - Validates command-specific permissions
    • checkCancelPermission() - Complete permission check combining all factors

API Layer

Request Schema & Validation:

  • common/api/endpoint/actions/response_actions/cancel/cancel.ts - Request schema with action_id validation
  • common/api/endpoint/actions/response_actions/cancel/cancel.schema.yaml - OpenAPI specification

Route Implementation:

  • server/endpoint/routes/actions/response_actions.ts - Cancel route handler that:
    1. Uses standard withEndpointAuthz middleware for base permissions
    2. Fetches original action details to determine command type and agent type
    3. Applies dynamic permission validation using authorization utilities
    4. Delegates to existing response action execution framework

UI Components

Console Integration:

  • public/management/components/endpoint_responder/command_render_components/cancel_action.tsx
    • React component for executing cancel operations from response console
    • Uses established console action submitter patterns

Pending Actions Selector:

  • public/management/components/console_argument_selectors/pending_actions_selector/
    • UI component for selecting pending actions to cancel
    • Implements permission-based filtering showing only cancelable actions
    • Provides clear error messages when actions aren't cancelable

API Communication:

  • public/management/hooks/response_actions/use_send_cancel_request.ts
    • React Query-based hook for cancel API requests
    • Follows established mutation patterns

Feature Flag

Added microsoftDefenderEndpointCancelEnabled experimental feature flag to:

  • Control availability of cancel functionality
  • Enable gradual rollout and testing
  • Located in common/experimental_features.ts

Technical Approach

Permission Model

The implementation uses a dynamic, context-aware authorization model rather than static permissions:

// Permission check combines multiple factors:
const canCancel = checkCancelPermission(
  endpointAuthz,           // User's current permissions
  experimentalFeatures,    // Feature flag status
  agentType,              // Agent type support
  command                 // Original command permission
);

Integration Pattern

The solution integrates seamlessly with existing patterns:

  • Extends BaseActionRequestSchema for consistency
  • Uses standard withEndpointAuthz middleware
  • Leverages existing response action execution framework
  • Follows established UI component patterns

Testing

Comprehensive test coverage added:

  • common/endpoint/service/authz/cancel_authz_utils.test.ts - Authorization logic tests
  • server/endpoint/routes/actions/response_actions.test.ts - Route handler tests
  • public/management/components/endpoint_responder/command_render_components/integration_tests/cancel_action.test.tsx - UI integration tests
  • public/management/components/console_argument_selectors/pending_actions_selector/pending_actions_selector.test.tsx - Selector component tests

Key Changes

Modified Files:

  • Authorization: Added canCancelResponseActions: false to EndpointAuthz interface for framework consistency
  • Constants: Added CANCEL_ROUTE constant to endpoint route definitions
  • Console Commands: Integrated cancel command into Microsoft Defender Endpoint console
  • Response Actions List: Added cancel action support to action history display

New Files:

  • Authorization utilities for cancel permission logic
  • Cancel action API schemas and route handlers
  • UI components for cancel functionality
  • React hooks for API communication
  • Comprehensive test files

Impact & Limitations

Benefits:

✅ Security analysts can now cancel stuck or unnecessary Microsoft Defender Endpoint actions
✅ Dynamic permission system ensures proper authorization
✅ Intuitive UI for selecting and cancelling pending actions
✅ Proper API documentation and validation
✅ Follows established architectural patterns

Current Limitations:

  • Cancel operations currently supported only for Microsoft Defender Endpoint agents
Screenshot 2025-09-11 at 11 39 02 Screenshot 2025-09-01 at 11 34 31 Screenshot 2025-09-01 at 11 34 20 Screenshot 2025-09-11 at 11 39 51 Screenshot 2025-09-15 at 08 26 42

Closes: https://github.com/elastic/security-team/issues/13443
Closes: https://github.com/elastic/security-team/issues/13444
Closes: https://github.com/elastic/security-team/issues/13445
Closes: https://github.com/elastic/security-team/issues/13464
Closes: https://github.com/elastic/security-team/issues/13766

@tomsonpl tomsonpl self-assigned this Aug 4, 2025
@tomsonpl
Copy link
Contributor Author

tomsonpl commented Aug 6, 2025

/ci

# Conflicts:
#	x-pack/solutions/security/plugins/security_solution/public/management/components/console_argument_selectors/custom_scripts_selector/custom_script_selector.tsx
#	x-pack/solutions/security/plugins/security_solution/public/management/components/endpoint_responder/lib/console_commands_definition.ts
#	x-pack/solutions/security/plugins/security_solution/server/endpoint/services/actions/clients/lib/types.ts
@tomsonpl
Copy link
Contributor Author

/ci

@tomsonpl
Copy link
Contributor Author

/ci

@tomsonpl
Copy link
Contributor Author

/ci

@tomsonpl
Copy link
Contributor Author

/ci

@tomsonpl
Copy link
Contributor Author

@paul-tavares Thanks for the feedback! I applied some changes to the picker's option, and fixed the tooltip. a69afe1
Screenshot 2025-09-17 at 12 31 18

However, regarding

❌ When selecting a previously entered (and successful) cancel command from the Console's input history, the script picker shows the UUID of the action, but the picker does not show it as a selectable option (because it was already completed and no longer pending).

I am not sure what else we should do here — to me handling this scenario differently would be an overkill. Users still can try to call this action, but the picker is not supposed to reflect that, it shows only currently pending actions.
What would you expect to happen in this case?

kibanamachine and others added 5 commits September 17, 2025 11:00
…el-action

# Conflicts:
#	x-pack/solutions/security/plugins/security_solution/public/management/components/endpoint_responder/command_render_components/integration_tests/cancel_action.test.tsx
Copy link
Member

@ashokaditya ashokaditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes in here and patience with continuing with more change requests. I tested it locally and found some issues that I'll list here. I could review backend code so far and would do another round of review for the frontend code shortly.

  1. Cancel action is available even if there was no action taken or pending. I think it should be disabled when there is no action to cancel. See error shown at the start state when no action has ever been taken.
Screenshot 2025-09-17 at 11 54 10
  1. I see considerable lag between action taken and it showing up on the cancel action dropdown list, and sometimes the action completes before the cancel action can be taken. See screenshot.
Screenshot 2025-09-17 at 12 09 46
  1. Is another user supposed to be able to cancel a cancel action taken by another user? I think not. This should be restricted to the user who took the original action. I see potential issues with allowing cross-user cancel action for a cancel action.

  2. Cancelled cancel actions show up as "successful" actions in the action history, instead of failed like we show for cancelled response actions. This should be consistent. So if we are allowing cancelling cancel action then all cancelled actions should show as failed.

Note: My test clips are bigger than 10 mb for attaching here so I'll send them to you offline

) {
const responseActionAuthzNames = uniq(
Object.values(RESPONSE_CONSOLE_ACTION_COMMANDS_TO_REQUIRED_AUTHZ)
Object.values(RESPONSE_CONSOLE_ACTION_COMMANDS_TO_REQUIRED_AUTHZ) as Array<keyof EndpointAuthz>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a redundant cast. TS correctly infers the type of responseActionAuthzNames.

Comment on lines +235 to +239
const legacyResponseData = responseActionsWithLegacyActionProperty.includes(command)
? {
action: actionId ?? data.id ?? '',
}
: {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to continue using the deprecated action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'd say let's keep it, and if we decide to remove this code - let's do it in a separate PR. Is that ok?

error?: EcsError;
/** Host info that might have been stored along with the Action Request (ex. 3rd party EDR actions) */
hosts: ActionDetails['hosts'];
/** Additional metadata that might be stored with the action (ex. 3rd party EDR action IDs) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this meta being used anywhere. Do we need this?

if (additionalChecks) {
try {
await additionalChecks(context, request);
await additionalChecks(context, request, logger);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree. I think best to not make a change to this file at all, when endpointContext can be used for logger. I think this is an unnecessary change here.

): Promise<void> => {
const { parameters } = request.body as CancelActionRequestBody;
const actionId = parameters.id;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create the logger here like so, const logger = endpointContext.logFactory.get('cancelActionAdditionalChecks'); here instead of passing a logger in the param of this anonymous async function.

}

// Check if Microsoft Defender Endpoint cancel feature is enabled
if (!featureFlags.microsoftDefenderEndpointCancelEnabled) {
Copy link
Member

@ashokaditya ashokaditya Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree➕ . The condition should be combined for feature flag and MDE agent type

'suspend-process': 'canSuspendProcess',
scan: 'canWriteScanOperations',
runscript: 'canWriteExecuteOperations',
cancel: 'canAccessResponseConsole', // Cancel uses base console permission
Copy link
Member

@ashokaditya ashokaditya Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean if you can't access response console you can't access the API? For example if a user created a release action but wants to cancel it (on a platinum license). You won't be able to do this via API with the current privilege tied to Enterprise license. Cancel action should work via API for all allowable response actions in a downgrade scenario. For instance for release action that is available to non-enterprise users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to a new privilege that does not expect enterprise license 👍 that's a very good catch @ashokaditya , thanks!


export interface ResponseActionCancelOutputContent {
code: string;
actionId: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the cancelled action's id or the id for the cancel action? Maybe name it so it's clear. Also where is this used? I don't see it being populated in the response of history API or /api/endpoint/action or /api/endpoint/action/cancel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 👍

size: 1,
_source: ['agent', 'device.id', 'event.created'],
sort: [{ 'event.created': 'desc' }],
_source: ['agent', 'device.id', '@timestamp'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this bug fix should go in this PR!

Comment on lines 149 to 150
_source: ['agent', 'device.id', '@timestamp'],
sort: [{ '@timestamp': 'desc' }],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not merge this CS bug fix in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was merged in another PR, so the change is not included here - but thanks for double checking 👍

@tomsonpl
Copy link
Contributor Author

@ashokaditya Thanks! Good comments, I applied most of them in 3a2d495, fixed the bug where the pending list wasn't refetched too 👍 For the UI discussion we had, I'll create follow up issues so we can make a decision and apply after FF 👍 Big thanks!

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #72 / Dataset Quality Dataset quality handles user privileges User can read logs-* User can monitor some data streams "before all" hook for "shows underprivileged warning when size cannot be accessed for some data streams"

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
securitySolution 8026 8034 +8

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
securitySolution 10.5MB 10.6MB +11.5KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
securitySolution 97.2KB 97.2KB +42.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 678 684 +6

References to deprecated APIs

id before after diff
securitySolution 388 389 +1

Total ESLint disabled count

id before after diff
securitySolution 782 788 +6

History

cc @tomsonpl

Copy link
Member

@ashokaditya ashokaditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the hard work on this, subsequent changes and patience with this @tomsonpl. 🙇🏻 I have not been able to test it again after privilege changes that I thought should be considered and that you promptly added. I'll test it after you merge this and make improvements as we begin to test cancel action.

@tomsonpl tomsonpl merged commit ec047c7 into elastic:main Sep 23, 2025
12 checks passed
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Sep 24, 2025
rylnd pushed a commit to rylnd/kibana that referenced this pull request Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:feature Makes this part of the condensed release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants