Adding autocompletion + scripts + standard deviation in 'mapper time' heuristic #275

alexandre32 · 2017-08-10T14:42:37Z

hello,
i add some features:
Autocompletion in search field of 'job history' tab (from DB).
2 scripts for automatic adding new heuristic
Calcultion of standard deviation in 'mapper time' heuristic

…plying it to the query instead of asking the query to perform the case-insensitive comparison.

Changed HadoopJobData to include finishTime since that is needed for metrics. Changed the signature of getJobCounter to include jobConf and jobData so that it can publish metrics Updated README.md Tested locally on my box and on spades RB=406817 BUGS=HADOOP-7814 R=fli,mwagner A=fli

The java file DaliMetricsAPI.java has a flavor of the APIs that we will be exposing from the dali library. We can split these classes into individual files when we move this functionality to the dali library. Changed start script to look for a config file that configures a publisher. If the file is present, then dr-elephant is started with an option that has the file name. If the file is not present, then the behavior is unchanged (i.e. no metrics are published). If the file is parsed correctly then dr-elephant publishes metrics in HDFS (one avro file per job) for jobs that are configured to publish the metrics. The job needs to set something like mapreduce.job.publish-counters='org.apache.hadoop.examples.WordCount$AppCounter:*' to publish all counters in the group given. The format is : 'groupName:counterName' where counterName can be an asterisk to indicate all counters in the group. See the class DaliMetricsAPI.CountersToPublish The HDFSPublisher is configured with a base path under which metrics are published. The date/hour hierarchy is added to the base path. The XML file for configuring dr-elephant is checked in as a template. A config file needs to be added to the 'conf' path of dr-elephant (manually, as per meeting with hadoop-admin) on clusters where we want dr-elephant to publish metrics. RB=409443 BUGS=HADOOP-7814 R=fli,csteinba,mwagner,cbotev,ahsu A=fli,ahsu

hadoop-1 does not have JobStatus.getFinishTime(). This causes dr-elephant to hang. Set the start time to be same as finish time for h1 jobs. For consistency, reverted to the old method of scraping the job tracker url so that we get only start time, and set the finish time to be equal to start time for retired jobs as well. RB=417975 BUGS=HADOOP-8640 R=fli,mwagner A=fli

RB=417448 BUGS=HADOOP-8648 R=fli A=fli

…ff 51 reducers instead of 50

…istic

…r time help page

…increasing mapred.min.split.size for too many mappers, NOT mapred.max.split.size

…n Help topics page

…name RB=468832 BUGS=HADOOP-10405 R=fli A=fli,ahsu

Jobs which put large files(> 500MB) in the distributed cache are flagged. Files as part of the following are considered. mapreduce.job.cache.files mapreduce.job.cache.archives

…rs (linkedin#202)

* Removes pattern matching

…p2 (linkedin#203) (1) Use ArrayList instead (2) Add unit test for this

…dd missing workflow links (linkedin#207)

…#210)

…n#217)

…a when sampling is enabled (linkedin#222)

) This commit allows Dr. Elephant to fetch Spark logs without universal read access to eventLog.dir on HDFS. SparkFetcher would use SparkRestClient instead of SparkLogClient if configured as <params> <use_rest_for_eventlogs>true</use_rest_for_eventlogs> </params> The default behaviour is to fetch the logs via SparkLogClient/WebHDFS.

…alone fetcher (linkedin#232) Remove backup for Rest Fetcher and make Legacy FSFetcher as top level fetcher. Change the default fetcher in the config.

* Fix SparkMetricsAggregator to not produce negative ResourceUsage

…#229)

…ally. (linkedin#243)

…hese objects when we implement our own parser (linkedin#248)

* We have been ignoring Failed Tasks in calculation of resource usage. This handles that. * Fixes Exception heuristic which was supposed to give the stacktrace.

…inkedin#250)

…to use them (linkedin#254)

…autocompletion search fild in tab job history

akshayrai

Sorry for the delayed review. I added the comments but forgot to publish them.

Can you clarify the motivation behind the scripts for automatic adding of heuristics?

Also, please ensure that you avoid addressing multiple issues in one PR.

akshayrai · 2017-08-24T10:37:52Z

app-conf/elephant.conf

-db_user=root
-db_password=""
+db_user=drelephant
+db_password="Dr-elephant123"


Change this back to root and "".

akshayrai · 2017-08-24T10:40:09Z

compile.sh

 mkdir dist

-play_command $OPTS clean test compile dist
+play_command $OPTS clean compile dist


Can you add the test back?

akshayrai · 2017-08-24T10:40:52Z

resolver.conf

+[repositories]
+local
+maven-central
+cloudera:https://repository.cloudera.com/cloudera/cloudera-repos/


Are these required? Please remove them.

akshayrai · 2017-09-07T18:09:05Z

app/controllers/Application.java

    // get the graph type
    String graphType = form.get("select-graph-type");
+
+    SqlQuery q = Ebean.createSqlQuery("select distinct job_def_id from yarn_app_result order by job_def_id;");


I am not in favor of merging this. In prod environments, there will be millions of entries and this can become an overkill.

Akshay Rai and others added 30 commits December 13, 2014 10:10

HADOOP-8096: Property 'username' is converted to lowercase before sup…

65ae460

…plying it to the query instead of asking the query to perform the case-insensitive comparison.

[HADOOP-8648]Updating dr-elephant release ID to 0.6.4

be47552

RB=417448 BUGS=HADOOP-8648 R=fli A=fli

[HADOOP-8648] Reset version to 0.6.5-SNAPSHOT

b6ddadd

HADOOP-8294: Improved slow performance of join queries

194dee4

HADOOP-5369: Paginate search results of Dr. Elephant

45e3ca2

HADOOP-7948: Make Dr. Elephant's Reducer Time Moderate Severity cut-o…

f7562cc

…ff 51 reducers instead of 50

HADOOP-8320: Add detailed suggestions in Dr. Elephant help page

f6a2a95

HADOOP-8320: Add detailed suggestions in Dr. Elephant help page

8272cf0

HADOOP-8856: Fix a broken Dr.Elephant test case for reducer time heur…

138693a

…istic

HADOOP-8859: Add ideal task time suggestion in Dr. Elephant 's reduce…

81a27f7

…r time help page

HADOOP-8846: Release Dr. Elephant v0.6.5

6beb02d

HADOOP-8929: Update Dr.Elephant version number to 0.6.6-SNAPSHOT

12408ba

HADOOP-9685: Dr. Elephant fails to fetch task counter data on hadoop2

28d1283

HADOOP-8941: Further improve search performance for Dr. Elephant

335198a

HADOOP-9714: Add Pluggable jobtype to Dr. Elephant

adf1cc2

HADOOP-9716: Dr. Elephant rest interface needs search endpoint

c16a050

HADOOP-9853: Release Dr. Elephant v0.6.6

d73bc48

HADOOP-9886: Fix for making Dr. Elephant's pagination thread safe

d163a69

HADOOP-10289: Update Dr. Elephant version number to 0.6.7-SNAPSHOT

0fcaa7b

HADOOP-10089: Dr. Elephant misses on type of Voldemort bnp job

9e72289

HADOOP-10301: Dr. Elephant occasionally misses a job

872b20e

HADOOP-9900: Dr. Elephant Mapper Input Size Help Page should suggest …

877ae89

…increasing mapred.min.split.size for too many mappers, NOT mapred.max.split.size

HADOOP-10290: Release Dr. Elephant v0.6.7

71a7fc5

HADOOP-10314: Dr. Elephant should not mention deprecated properties i…

9009da5

…n Help topics page

[HADOOP-10405] Fix one-off error in getting cluster name from NN host…

1bab26e

…name RB=468832 BUGS=HADOOP-10405 R=fli A=fli,ahsu

HADOOP-11768: Update Dr. Elephant version to v1.0.0-SNAPSHOT

478b2d5

OFFREL-234: Adding Spark log analysers into Dr. Elephant

df3f7a6

nntnag17 and others added 28 commits January 24, 2017 16:16

Show exceptions for failed workflows (linkedin#188)

1d0350b

Added new heuristic DistributedCacheLimit heuristic. (linkedin#187)

4df9ba9

Jobs which put large files(> 500MB) in the distributed cache are flagged. Files as part of the following are considered. mapreduce.job.cache.files mapreduce.job.cache.archives

Cleanes up MapReduceTaskData class by removing unnecessary constructo…

2a84735

…rs (linkedin#202)

Fixes Spark REST fetcher for client mode applications (linkedin#193)

e93d431

* Removes pattern matching

Fix for null pointers in TaskList returned by MapReduceFSFetcherHadoo…

dd7a458

…p2 (linkedin#203) (1) Use ArrayList instead (2) Add unit test for this

Fix linkedin#162 with the right calculation for resourceswasted and a…

d3c90d5

…dd missing workflow links (linkedin#207)

Fix Exception thrown when JAVA_EXTRA_OPTIONS is not present (linkedin…

da7983c

…#210)

Adds an option to fetch recently finished apps from RM (linkedin#212)

0d668ab

Fixes issue caused by http in history server config property (linkedi…

f6274b1

…n#217)

add config for timezone of job history server (linkedin#214)

965cba3

Include reference to the weekly meeting

6b80614

Fixes MapReduce aggregator and heuristic to correctly handle task dat…

5a98701

…a when sampling is enabled (linkedin#222)

Update linkedin#224 (credits: rayortigas) to add FSFetcher as a stand…

c8a7009

…alone fetcher (linkedin#232) Remove backup for Rest Fetcher and make Legacy FSFetcher as top level fetcher. Change the default fetcher in the config.

Spark metrics aggregator fix (linkedin#237)

b7e04ab

* Fix SparkMetricsAggregator to not produce negative ResourceUsage

Minor bug fixes in exception and UI (linkedin#238)

a1f866a

Updates Spark configuration heuristic severity calculations (linkedin…

7c373d4

…#229)

Enables SparkFetcher to only get eventLog via rest and process it loc…

1ca2676

…ally. (linkedin#243)

Refactor statusapiv1 to trait and implement for ease of creation of t…

cae79c7

…hese objects when we implement our own parser (linkedin#248)

MRfetcher ignores failed tasks (linkedin#249)

cdf680b

* We have been ignoring Failed Tasks in calculation of resource usage. This handles that. * Fixes Exception heuristic which was supposed to give the stacktrace.

Add index on severity, finish_time to speed up welcome page display (l…

f77886a

…inkedin#250)

Add pinball scheduler to dr-elephant (linkedin#253)

54a16fd

add s3, s3a, s3n bytes read and bytes written, and update heuristics …

1d6f3f6

…to use them (linkedin#254)

Add custom flowtime per scheduler (linkedin#268)

9a65e0e

Add filtering on Job Definition Id in the Search view (linkedin#269)

7230038

added logic for map reduce time-skew heuristic (linkedin#267)

752a94b

update.sh and update.py scripts for automatic adding new heuristic + …

62a116f

…autocompletion search fild in tab job history

Standard deviation in mapper time heuristic

9190667

akshayrai suggested changes Sep 7, 2017

View reviewed changes

akshayrai force-pushed the master branch from 7c2fd7f to 8b46933 Compare December 12, 2017 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Adding autocompletion + scripts + standard deviation in 'mapper time' heuristic #275

Adding autocompletion + scripts + standard deviation in 'mapper time' heuristic #275

Uh oh!

alexandre32 commented Aug 10, 2017

Uh oh!

akshayrai left a comment

Uh oh!

akshayrai Aug 24, 2017

Uh oh!

akshayrai Aug 24, 2017

Uh oh!

akshayrai Aug 24, 2017

Uh oh!

akshayrai Sep 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants

Uh oh!

Adding autocompletion + scripts + standard deviation in 'mapper time' heuristic #275

Are you sure you want to change the base?

Adding autocompletion + scripts + standard deviation in 'mapper time' heuristic #275

Uh oh!

Conversation

alexandre32 commented Aug 10, 2017

Uh oh!

akshayrai left a comment

Choose a reason for hiding this comment

Uh oh!

akshayrai Aug 24, 2017

Choose a reason for hiding this comment

Uh oh!

akshayrai Aug 24, 2017

Choose a reason for hiding this comment

Uh oh!

akshayrai Aug 24, 2017

Choose a reason for hiding this comment

Uh oh!

akshayrai Sep 7, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants