-
Notifications
You must be signed in to change notification settings - Fork 467
Description
Is your feature request related to a problem? Please describe.
As tablet management functions move from the TabletServer to the Manager in the elasticity
branch, there is a concern that the Manager could become overloaded. Tasks such as examining RFile indices for splitting and compaction file selection (#3526).
Describe the solution you'd like
Accumulo already contains a distributed job execution service, but it's hardcoded to only perform compactions. If you look at the api you will see that the Compactor's API is:
tabletserver.TExternalCompactionJob getRunningCompaction
string getRunningCompactionId
list<tabletserver.ActiveCompaction> getActiveCompactions
void cancel
and the Coordinator's API is:
tabletserver.TExternalCompactionJob getCompactionJob
void compactionCompleted
void compactionFailed
void updateCompactionStatus
TExternalCompactionList getRunningCompactions
TExternalCompactionList getCompletedCompactions
void cancel
If you were to add additional job types to this API, then you would end up with a bunch of similar methods that just use different object types as Thrift doesn't support inheritance (only composition, FWIW GRPC is the same with Protobuf). However, one approach would be to pass the job details, tabletserver.TExternalCompactionJob in this case, as a serialized Java object over the Thrift API. I think this work could be done in 3 steps:
- Make the Thrift API generic
a. rename the methods to remove compaction from the name
b. Modify the data structures to make them more generic - Modify the Coordinator and Compactor to pass serialized Java objects over the Thrift API. The Java objects will be deserialized on the receiving end and the appropriate action taken. For example, the Compactor would call
Coordinator.getJob
, deserialize the response and execute the logic for the job type. - Rename the components with more generic names.
Describe alternatives you've considered
I surveyed some of the solutions listed at https://github.com/meirwah/awesome-workflow-engines and while I think some of them could work, they seem like overkill.
Additional context
Add any other context or screenshots about the feature request here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status