-
Notifications
You must be signed in to change notification settings - Fork 467
Modifies the Compactor to be a more generic job execution process #3801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…er thrift with strict typing
There is still more work to be done here, but wanted to get some feedback on:
The basic gist of the design here is:
If I had to guess, I'm probably about 80-85% done at this point. |
public void start() { | ||
long interval = this.context.getConfiguration() | ||
.getTimeInMillis(Property.COMPACTION_COORDINATOR_DEAD_COMPACTOR_CHECK_INTERVAL); | ||
.getTimeInMillis(Property.TASK_MANAGER_DEAD_COMPACTOR_CHECK_INTERVAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing to do in this PR, but I think in general we may need reconsider some of the property prefixes. Opened #3803 for this. For this property maybe it should have a another prefix that is not related to where its running, but is more related to the compaction functionality. Not sure though, would be best to consider all properties are once for this instead of looking at this one in isolation.
/* | ||
* Called by the Monitor to get progress information | ||
*/ | ||
Task getRunningTasks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is soley for discussion, not recommending any changes. Thinking it would be best to push as much serialization into thrift as possible for things that are common. Took a stab at this below, but not sure if its workable.
// renamed from WorkerType
enum TaskType {
COMPACTION
LOG_SORTING
SPLIT_POINT_CALCULATION
}
enum TaskState {
RUNNING,
COMPLETE,
FAILED
}
// used to report the status of a task
struct TaskStatus {
string taskId
long startTime // time a worker started running the task
TaskType taskType
TaskState taskState
byte[] data // status data specific to the taskType and taskState
}
list<TaskStatus> getRunningTasks(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no issue renaming WorkerType
to something like TaskRunnerType
. I'm open to discussing any changes.
@dlmarion if you have some time to chat sometime I would like to discuss some questions I have. Wondering about the following. Where things run? Maybe instead of having a task runner executable, task runner is more of an internal code library that user facing executable components instantiate. For example thinking through what the user facing accumulo commands could look like and what they could do.
What task return? Maybe nothing, could we structure all task such that there is no return of data to manager? A task runner gets a task from the manager, runs it, and when done gets another task. It never reports completion to the manager or status.
If task do not return anything, then that simplifies the thrift API and the manager possibly. Would not need to worry about keeping info in memory in the manager and keeping that info consistent and avoiding using to much memory. |
Yeah, let's find a time.
With on-demand tablets it's possible that a user only has enough TabletServers running to host the root and metadata tables - that would be my only concern with doing log sorting in the TabletServer. I do like the idea of non-primary managers doing some of this work (which BTW I had working a while back in #3262 ). If we could leverage the non-primary managers, then this PR could likely be closed leaving the compactors the way that they are today. I think we should explore that idea sooner rather than later. There was another item mentioned in #3796 - compaction selection functionality. If we performed compaction selection and split calculations in the non-primary manager, left compactors the way they are today, then we just need a server component to run log sorting. IIRC from #3262 you can have multiple non-primary managers.
|
I think this PR is still, useful but maybe scoped down. Thinking of this at a really high level and seeing the following. What I am trying to puzzle out is where actual code links in to these high level concepts. Also, is the following the complete high level picture? Generalizing distributed work in Accumulo
|
There is also the issue of checking for dead workers so that the task can be queued up again. We have this today in the Manager with the DeadCompactionDetector. |
No description provided.