-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added preliminary support for Tez #278
Open
abhishekdas99
wants to merge
287
commits into
linkedin:master
Choose a base branch
from
abhishekdas99:tez_support
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…plying it to the query instead of asking the query to perform the case-insensitive comparison.
Changed HadoopJobData to include finishTime since that is needed for metrics. Changed the signature of getJobCounter to include jobConf and jobData so that it can publish metrics Updated README.md Tested locally on my box and on spades RB=406817 BUGS=HADOOP-7814 R=fli,mwagner A=fli
The java file DaliMetricsAPI.java has a flavor of the APIs that we will be exposing from the dali library. We can split these classes into individual files when we move this functionality to the dali library. Changed start script to look for a config file that configures a publisher. If the file is present, then dr-elephant is started with an option that has the file name. If the file is not present, then the behavior is unchanged (i.e. no metrics are published). If the file is parsed correctly then dr-elephant publishes metrics in HDFS (one avro file per job) for jobs that are configured to publish the metrics. The job needs to set something like mapreduce.job.publish-counters='org.apache.hadoop.examples.WordCount$AppCounter:*' to publish all counters in the group given. The format is : 'groupName:counterName' where counterName can be an asterisk to indicate all counters in the group. See the class DaliMetricsAPI.CountersToPublish The HDFSPublisher is configured with a base path under which metrics are published. The date/hour hierarchy is added to the base path. The XML file for configuring dr-elephant is checked in as a template. A config file needs to be added to the 'conf' path of dr-elephant (manually, as per meeting with hadoop-admin) on clusters where we want dr-elephant to publish metrics. RB=409443 BUGS=HADOOP-7814 R=fli,csteinba,mwagner,cbotev,ahsu A=fli,ahsu
hadoop-1 does not have JobStatus.getFinishTime(). This causes dr-elephant to hang. Set the start time to be same as finish time for h1 jobs. For consistency, reverted to the old method of scraping the job tracker url so that we get only start time, and set the finish time to be equal to start time for retired jobs as well. RB=417975 BUGS=HADOOP-8640 R=fli,mwagner A=fli
RB=417448 BUGS=HADOOP-8648 R=fli A=fli
…ff 51 reducers instead of 50
…increasing mapred.min.split.size for too many mappers, NOT mapred.max.split.size
…n Help topics page
…name RB=468832 BUGS=HADOOP-10405 R=fli A=fli,ahsu
Jobs which put large files(> 500MB) in the distributed cache are flagged. Files as part of the following are considered. mapreduce.job.cache.files mapreduce.job.cache.archives
* Removes pattern matching
…p2 (linkedin#203) (1) Use ArrayList instead (2) Add unit test for this
…dd missing workflow links (linkedin#207)
…a when sampling is enabled (linkedin#222)
) This commit allows Dr. Elephant to fetch Spark logs without universal read access to eventLog.dir on HDFS. SparkFetcher would use SparkRestClient instead of SparkLogClient if configured as <params> <use_rest_for_eventlogs>true</use_rest_for_eventlogs> </params> The default behaviour is to fetch the logs via SparkLogClient/WebHDFS.
…alone fetcher (linkedin#232) Remove backup for Rest Fetcher and make Legacy FSFetcher as top level fetcher. Change the default fetcher in the config.
* Fix SparkMetricsAggregator to not produce negative ResourceUsage
…hese objects when we implement our own parser (linkedin#248)
* We have been ignoring Failed Tasks in calculation of resource usage. This handles that. * Fixes Exception heuristic which was supposed to give the stacktrace.
Support sparksrteaming monitoring job? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is preliminary version of tez support.. The following changes are required to have full support.
Here is the summary:
Done:
Added basic support for tez jobs. Added couple of heuristics to make sure they appear in the UI.Current implementation is just having the same code as MR heuristics.
To Be Done:
The main problem with Tez support is one Yarn application can have multiple dags (or jobs). The current implementation assumes that for each Yarn application, there is one MR job. So some design changes need to be done.
Some UI support as well to show multiple dags under one Yarn application.
We need to come up with some class structure for heuristics as some heuristics will be exactly same for Tez and MR . So class hierarchy will help us not to write redundant code.