Using Apache NetBeans OQL to analyze heap dump
As part of my Charles University course Practical Dynamic Compilation I want to demonstate how to access raw data structures effectively. Let's analyze a heap dump!
Using VisualVM
The easiest way to look inside a heap dump is to use VisualVM. Take your .hprof
file,
open it and browse its content. In some sence a heap dump is an object database connecting classes, objects & co. in the
dump with each other. As such we could use a query language to inspect the heap!
VisualVM comes with one such language called OQL. Switch to OQL Console and execute following query:
var arr = [];
heap.forEachObject(function(o) {
if (o.length > 255) {
arr.push(o);
}
}, 'int[]')
arr
it gives you all integer arrays longer that 255 elements. OQL syntax is a mixture of JavaScript and SQL - however the
above script is pure JavaScript. It iterates the heap
using builtin forEachObject
function and collects the large arrays
in a callback. Complex heap analysis has just got easy!
However we can go even further. VisualVM's OQL implementation comes from
Apache NetBeans - why not use the org-netbeans-modules-profiler-oql
library
in a headless application and query (possibly in a batch) the .hprof
files from a command line!?
<dependencies>
<dependency>
<groupId>org.netbeans.modules</groupId>
<artifactId>org-netbeans-modules-profiler-oql</artifactId>
<version>RELEASE110</version>
</dependency>
</dependencies>
Only one dependency needed in your pom.xml and you can use OQL from your Main.java:
Heap heap = HeapFactory.createHeap(file);
final OQLEngine eng = new OQLEngine(heap);
eng.executeQuery("var arr = [];\n" +
"heap.forEachObject(function(o) {\n" +
" if (o.length > 255) {\n" +
" arr.push(o);\n" +
" }\n" +
"}, 'int[]')\n" +
"print('Found ' + arr.length + ' long int arrays');"
, OQLEngine.ObjectVisitor.DEFAULT);
Try it yourself:
$ git clone https://github.com/jaroslavtulach/heapdump
$ mvn -q -f heapdump/ package exec:exec -Dheap=/path/to/your/dump.hprof
Loading dump.hprof
Querying the heap
Found 7797 long int arrays
Round #1 took 6035 ms
Found 7797 long int arrays
Round #2 took 4309 ms
Found 7797 long int arrays
Round #3 took 3900 ms
Found 7797 long int arrays
....
Round #20 took 3444 ms
Heap dump processing automated with a few lines of code!
Getting Faster with GraalVM
The default Main.java file works as a benchmark. It scans the heap multiple times and reports time of each round. The speed depends on the used JavaScript engine. Nashorn, the default JDK8 and JDK11 was able to process my 661MB heap in 3.5 seconds. Can we do better?
Sure we can! Download GraalVM which provides its own Graal.js script engine and run the benchmark again:
$ /graalvm-ee-1.0.0-rc16/bin/java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) GraalVM EE 1.0.0-rc16 (build 25.202-b08-jvmci-0.59, mixed mode)
$ JAVA_HOME=/graalvm-ee-1.0.0-rc16 mvn -q -f heapdump/ package exec:exec -Dheap=dump.hprof
Loading dump.hprof
Querying the heap
Found 7797 long int arrays
Round #1 took 4008 ms
Found 7797 long int arrays
Round #2 took 1631 ms
Found 7797 long int arrays
Round #5 took 640 ms
Found 7797 long int arrays
Round #9 took 300 ms
Found 7797 long int arrays
Round #20 took 230 ms
Fiveteen times faster! Good result for a simple replace of one JDK by another, right? Apache NetBeans gives you useful libraries. GraalVM makes them run fast!
That is the plot. Now the we can focus on the main question of my course: Can we make it even faster?
Important highlights from the video:
- 5:30 - how to take your data structure like DB and expose it to polyglot languages effectively
- 6:10 - typically people write their API in C and add bindings to other languages “
- 26:35 - how polyglot works in Truffle and it’s compiler
- 36:45 - “all you need is …. TruffleObject”
- 38:10 - exploring the effective compilation via IGV
There is a HeapLanguage branch in this repository holding all the code used by the Designing APIs for Polyglot World presentation.