Skip to content

Convert Compactor/Coordinator into generic distributed job execution service #3658

@dlmarion

Description

@dlmarion

Is your feature request related to a problem? Please describe.
As tablet management functions move from the TabletServer to the Manager in the elasticity branch, there is a concern that the Manager could become overloaded. Tasks such as examining RFile indices for splitting and compaction file selection (#3526).

Describe the solution you'd like
Accumulo already contains a distributed job execution service, but it's hardcoded to only perform compactions. If you look at the api you will see that the Compactor's API is:

  tabletserver.TExternalCompactionJob getRunningCompaction
  string getRunningCompactionId
  list<tabletserver.ActiveCompaction> getActiveCompactions
  void cancel

and the Coordinator's API is:

  tabletserver.TExternalCompactionJob getCompactionJob
  void compactionCompleted
  void compactionFailed
  void updateCompactionStatus
  TExternalCompactionList getRunningCompactions
  TExternalCompactionList getCompletedCompactions  
  void cancel  

If you were to add additional job types to this API, then you would end up with a bunch of similar methods that just use different object types as Thrift doesn't support inheritance (only composition, FWIW GRPC is the same with Protobuf). However, one approach would be to pass the job details, tabletserver.TExternalCompactionJob in this case, as a serialized Java object over the Thrift API. I think this work could be done in 3 steps:

  1. Make the Thrift API generic
    a. rename the methods to remove compaction from the name
    b. Modify the data structures to make them more generic
  2. Modify the Coordinator and Compactor to pass serialized Java objects over the Thrift API. The Java objects will be deserialized on the receiving end and the appropriate action taken. For example, the Compactor would call Coordinator.getJob, deserialize the response and execute the logic for the job type.
  3. Rename the components with more generic names.

Describe alternatives you've considered
I surveyed some of the solutions listed at https://github.com/meirwah/awesome-workflow-engines and while I think some of them could work, they seem like overkill.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

Labels

enhancementThis issue describes a new feature, improvement, or optimization.

Type

No type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions