Updating Executor GC heuristic recommendation for high executor GC

ShubhamGupta29 · ShubhamGupta29 · commit 73795ce4fc96 · 2019-10-18T15:51:41.000+05:30
diff --git a/app/com/linkedin/drelephant/spark/heuristics/ExecutorGcHeuristic.scala b/app/com/linkedin/drelephant/spark/heuristics/ExecutorGcHeuristic.scala
@@ -55,11 +55,13 @@ class ExecutorGcHeuristic(private val heuristicConfigurationData: HeuristicConfi
 
     //adding recommendations to the result, severityTimeA corresponds to the ascending severity calculation
     if (evaluator.severityTimeA.getValue > Severity.LOW.getValue) {
-      resultDetails = resultDetails :+ new HeuristicResultDetails("Gc ratio high", "The job is spending too much time on GC. We recommend increasing the executor memory.")
+      resultDetails = resultDetails :+ new HeuristicResultDetails("Gc ratio high",
+        "The job is spending too much time on GC. Recommended to increase the executor memory." + evaluator.parallelGcRecommendation + "Can also try reducing number of UDF calls.")
     }
     //severityTimeD corresponds to the descending severity calculation
     if (evaluator.severityTimeD.getValue > Severity.LOW.getValue) {
-      resultDetails = resultDetails :+ new HeuristicResultDetails("Gc ratio low", "The job is spending too little time in GC. Please check if you have asked for more executor memory than required.")
+      resultDetails = resultDetails :+ new HeuristicResultDetails("Gc ratio low",
+        "The job is spending too little time in GC. Please check if you have asked for more executor memory than required.")
     }
 
     val result = new HeuristicResult(
@@ -103,6 +105,11 @@ object ExecutorGcHeuristic {
       throw new Exception("No executor information available.")
     }
 
+    val sparkExecutorExtraJavaOptions = appConfigurationProperties.getOrElse("spark.executor.extraJavaOptions","")
+    val isParallelGCEnabled: Boolean = sparkExecutorExtraJavaOptions.contains("XX:+UseParallelGC")
+    val isG1GCenabled: Boolean = sparkExecutorExtraJavaOptions.contains("XX:+UseG1GC")
+    val gcRecommendation: String = if (isParallelGCEnabled || isG1GCenabled)  "" else "Enable ParallelGC or G1GC using spark.executor.extraJavaOptions."
+
     lazy val appConfigurationProperties: Map[String, String] =
       data.appConfigurationProperties
     var (jvmTime, executorRunTimeTotal) = getTimeValues(executorSummaries)
diff --git a/app/views/help/spark/helpExecutorGcHeuristic.scala.html b/app/views/help/spark/helpExecutorGcHeuristic.scala.html
@@ -17,4 +17,12 @@
 <p>This analysis shows how much time a job is spending in GC. To normalise the results across all jobs, the ratio of the time a job spends in Gc to the total run time of the job is calculated. </p>
 <p>A job is flagged if the ratio is too high, meaning the job spends too much time in GC.</p>
 <h3>Suggestions</h3>
-<p>We recommend increasing the executor memory.</p>
+<ul>
+<li>We recommend increasing the executor memory.</li>
+<li>Enabling G1GC or ParallelGC using spark.executor.extraJavaOptions could help.</li>
+  <ul>
+    <li>User can enable G1GC or ParallelGC by adding <b>-XX:+UseG1GC</b> or <b>-XX:+UseParallelGC</b> respectively to Spark configuration spark.executor.extraJavaOptions</li>
+  </ul>
+<li>High GC can occur if the number of UDF calls made is high, especially if the UDFs are inefficient or use a lot of memory.</li>
+</ul>
+<p>For some general guideline about how to tune GC for your Spark application refer <a href="https://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning" target="_blank">here</a></p>