Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ForceMove option on vacate parameters #489

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

riccardomarinelli
Copy link

@riccardomarinelli riccardomarinelli commented Oct 10, 2024

Description

Adds ForceMove parameter to vacate parametes, in order to choose primitive vacate or standard vacate

Related Issues

Rel: https://elasticco.atlassian.net/browse/INFRA-3057
Rel: https://elasticco.atlassian.net/browse/INFRA-2823

Motivation and Context

When a vacate plan fails due to Primitive vacate failure ( with error InfrastructureFailure:CopyDataFailure), we want to be able to retry with Standad vacate instead of Primitive vacate from automation tools like Fleetctl in order to try to reduce the number of SDH issues created.

How Has This Been Tested?

This has been tested locally from Fleetctl https://github.com/elastic/fleetctl/pull/195, testing the cloud-sdk-go changes with this line in go.mod replace github.com/elastic/cloud-sdk-go => ../cloud-sdk-go

image

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (improves code quality but has no user-facing effect)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation

Readiness Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@riccardomarinelli riccardomarinelli added the enhancement New feature or request label Oct 14, 2024
@riccardomarinelli riccardomarinelli marked this pull request as ready for review October 14, 2024 04:56
@riccardomarinelli riccardomarinelli requested a review from a team as a code owner October 14, 2024 04:56
Copy link

@gheorghepucea gheorghepucea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I have two questions.

@@ -391,6 +393,7 @@ func newMoveClusterParams(params *VacateClusterParams) (*platform_infrastructure
platform_infrastructure.NewMoveClustersParams().
WithAllocatorDown(params.AllocatorDown).
WithMoveOnly(params.MoveOnly).
WithForceMove(params.ForceMove).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed here? I would remove it because I think it alters the soteria behaviour i.e. currently we:

  1. Issue a request to validate the move request validate_only=true
  2. Use the params returned to issue the actual vacate

Now, if we set the force_move flag in the validation API, the admin console will respect what u asked.
So in case of soteria, what would be the behaviour? which value will the force_move param have?

Copy link
Author

@riccardomarinelli riccardomarinelli Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this one, This is the result ( see logs https://platform-logging.kb.eu-west-1.aws.qa.cld.elstc.co/app/r/s/1ElD8 )

Testing with Force_move: true in Fleetctl code:
Our of 3 log entries,

The first one will have the correct value forceMove=[true]:

Request to move clusters: (allocatorId=[gi-5926589073801201158], authz=[Some(1)], moveRequest=[None], maybeClusterType=[None], forceUpdate=[false], allocatorDown=[None], validateOnly=[true], moveOnly=[true], forceMove=[true])

The second and the third one will have the wrong value forceMove=[false]:

Request to move clusters: (allocatorId=[gi-5926589073801201158], authz=[Some(1)], moveRequest=[Some(MoveClustersRequest(Some(List(MoveElasticsearchClusterConfiguration(List(44e610693f534648a7ccacedcb16c65b),None))),None,None,None,None))], maybeClusterType=[None], forceUpdate=[false], allocatorDown=[Some(false)], validateOnly=[true], moveOnly=[true], forceMove=[false])
Request to move clusters: (allocatorId=[gi-5926589073801201158], authz=[Some(1)], moveRequest=[Some(MoveClustersRequest(Some(List(MoveElasticsearchClusterConfiguration(List(44e610693f534648a7ccacedcb16c65b),Some(TransientElasticsearchPlanConfiguration(Some(PlanStrategy(None,Some(GrowShrinkStrategyConfig()),None,None)),Some(EsPlanControlConfiguration(Some(32768),Some(5),None,Some(List(AllocatorMoveRequest(gi-5926589073801201158,None,Some(false)))),Some(true),None,None,Some(false),None,None,Some(false),Some(3),Some(300),Some(false),None,Some(false),Some(false),Some(false),Some(false),Some(false),None)),None,None,None))))),None,None,None,None))], maybeClusterType=[Some(ElasticsearchClusterType)], forceUpdate=[false], allocatorDown=[Some(false)], validateOnly=[false], moveOnly=[true], forceMove=[false])

image

Testing with Force_move: true in Fleetctl code:

all the logs entries will have forceMove=[false] see https://platform-logging.kb.eu-west-1.aws.qa.cld.elstc.co/app/r/s/NtaT0

@@ -127,6 +127,7 @@ func moveNodes(id string, params *VacateParams, p *pool.Pool) ([]pool.Validator,
WithAllocatorID(id).
WithMoveOnly(params.MoveOnly).
WithContext(api.WithRegion(context.Background(), params.Region)).
WithForceMove(params.ForceMove).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, won't this alter the behaviour of the soteria flow? for example to always set force_move to true?

Copy link
Author

@riccardomarinelli riccardomarinelli Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this one, This is the result ( see logs https://platform-logging.kb.eu-west-1.aws.qa.cld.elstc.co/app/r/s/g99SA )

  • Testing with Force_move: true in Fleetctl code:
    Our of 3 log entries,

The first one will have the wong value forceMove=[false]:

Request to move clusters: (allocatorId=[gi-5926589073801201158], authz=[Some(1)], moveRequest=[None], maybeClusterType=[None], forceUpdate=[false], allocatorDown=[None], validateOnly=[true], moveOnly=[true], forceMove=[false])

The second and the third one will have the correct value forceMove=[true]:

Request to move clusters: (allocatorId=[gi-5926589073801201158], authz=[Some(1)], moveRequest=[Some(MoveClustersRequest(Some(List(MoveElasticsearchClusterConfiguration(List(44e610693f534648a7ccacedcb16c65b),None))),None,None,None,None))], maybeClusterType=[None], forceUpdate=[false], allocatorDown=[Some(false)], validateOnly=[true], moveOnly=[true], forceMove=[true])
Request to move clusters: (allocatorId=[gi-5926589073801201158], authz=[Some(1)], moveRequest=[Some(MoveClustersRequest(Some(List(MoveElasticsearchClusterConfiguration(List(44e610693f534648a7ccacedcb16c65b),Some(TransientElasticsearchPlanConfiguration(Some(PlanStrategy(None,Some(GrowShrinkStrategyConfig()),None,None)),Some(EsPlanControlConfiguration(Some(32768),Some(5),None,Some(List(AllocatorMoveRequest(gi-5926589073801201158,None,Some(false)))),Some(true),None,None,Some(false),None,None,Some(false),Some(3),Some(300),Some(false),None,Some(false),Some(false),Some(false),Some(false),Some(false),None)),None,None,None))))),None,None,None,None))], maybeClusterType=[Some(ElasticsearchClusterType)], forceUpdate=[false], allocatorDown=[Some(false)], validateOnly=[false], moveOnly=[true], forceMove=[false])

image

  • Testing with Force_move: true in Fleetctl code:

all the logs entries will have forceMove=[false] see see logs https://platform-logging.kb.eu-west-1.aws.qa.cld.elstc.co/app/r/s/se28H

@riccardomarinelli
Copy link
Author

As a last attempt added ForceMove as a Plan Override in the last commit and tested but also was unsuccessful

@riccardomarinelli
Copy link
Author

Also tested directly with curl, but the plans in adminconsole still had force_move: true

 ~ curl -X POST -H "Authorization: ApiKey $QA_API" -H "Content-type: application/json" -d @payload.json "https://admin.qa.cld.elstc.co/api/v1/regions/gcp-us-central1/platform/infrastructure/allocators/gi-4896651367850012401/clusters/elasticsearch/_move?allocator_down=false&force_move=false"
{
  "failures" : {
    "elasticsearch_clusters" : [
    ],
    "kibana_clusters" : [
    ],
    "apm_clusters" : [
    ],
    "appsearch_clusters" : [
    ],
    "enterprise_search_clusters" : [
    ]
  },
  "moves" : {
    "elasticsearch_clusters" : [
      {
        "cluster_id" : "44e610693f534648a7ccacedcb16c65b"
      }
    ],
    "kibana_clusters" : [
    ],
    "apm_clusters" : [
    ],
    "appsearch_clusters" : [
    ],
    "enterprise_search_clusters" : [
    ]
  }
}%
➜  ~ curl -X POST -H "Authorization: ApiKey $QA_API" -H "Content-type: application/json" -d @payload.json "https://admin.qa.cld.elstc.co/api/v1/regions/gcp-us-central1/platform/infrastructure/allocators/gi-4896651367850012401/clusters/elasticsearch/_move?allocator_down=false"
{
  "failures" : {
    "elasticsearch_clusters" : [
    ],
    "kibana_clusters" : [
    ],
    "apm_clusters" : [
    ],
    "appsearch_clusters" : [
    ],
    "enterprise_search_clusters" : [
    ]
  },
  "moves" : {
    "elasticsearch_clusters" : [
      {
        "cluster_id" : "44e610693f534648a7ccacedcb16c65b"
      }
    ],
    "kibana_clusters" : [
    ],
    "apm_clusters" : [
    ],
    "appsearch_clusters" : [
    ],
    "enterprise_search_clusters" : [
    ]
  }
}%

@riccardomarinelli
Copy link
Author

Moving this back as a Draft and will create a Bug Jira as commented with Gheorge Phucea

@riccardomarinelli riccardomarinelli marked this pull request as draft October 14, 2024 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants