Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraalVM Node OutOfMemory Truffle's DynamicObjectBasic not released on GC #268

Open
yacota opened this issue Apr 11, 2020 · 23 comments
Open
Assignees
Labels
bug Something isn't working memory Memory footprint, memory leaks

Comments

@yacota
Copy link

yacota commented Apr 11, 2020

I am using GraalVM's node to launch a Java application, the primary js is loading the Java application and both share a java.util.concurrent.LinkedBlockingDeque which is used to "send/share" tasks to execute from Java to the worker node thread.

The idea is somewhat similar to what is described in "Sharing Objects between workers" https://medium.com/graalvm/multi-threaded-java-javascript-language-interoperability-in-graalvm-2f19c1f9c37b

The js worker thread is using npm modules like jsdom, all of them are supported by GraalVM's npm

The peak performance is reached with a few request and the performance is even better than a "real" NodeJS application which does exactly the same, but the problem happens because a lot of DynamicObjectBasic are allocated, these can not be garbage collected and at the end an out of memory happens

The GraalVM version is 20.0

JVM: OpenJDK 64-Bit Server VM (11.0.6+9-jvmci-20.0-b02, mixed mode, sharing)
Java: version 11.0.6, vendor Oracle Corporation
Java Home: /Library/Java/JavaVirtualMachines/graalvm-ce-java11-20.0.0/Contents/Home
JVM Flags: <none>

I've tested the application with 2 configurations

  • default/community one (snapshots taken from this test)
  • compiler in economy (I attach the the diagnostics of the economy test)
    graal_diagnostics_26432.zip

With the default configuration I've analysed with Eclipse MAT the heapdump and it highlights the following

mat_culprit1

and

mat_culprits

I can share the memory dump if you wish, just ping me and I'll share it through dropbox

During the execution one can see that the allocation of DynamicObjectBasic increases, but it hardly decrease although invoking JVM's GC explicitly

In t

memory_overview_t1

In t+1

memory_overview_t2

The memory usage trends more or less looks like this

memory_trend

And the heap histogram like this

object_alloaction

I can try to create a simple project that will allow you to reproduce the problem if needed

@wirthi wirthi added bug Something isn't working memory Memory footprint, memory leaks labels Apr 14, 2020
@wirthi
Copy link
Member

wirthi commented Apr 14, 2020

Hi @yacota

thanks for sharing your insight and the extensive data you provided. Can you by any chance also provide the core of your code, i.e. where you create JavaScript and how you store them?

Without any source code, my first guess would be you create lots of differently shaped larger objects (i.e., objects with different properties, of different types). We create "Shape" Objects for that in the background (think of that as a Class in Java). You either retain links to all those types somehow, or there is a bug in our strategy to garbage collect them once unsed.

A similar problem we had seen before was when a JavaScript object was used like a map, with GUID values created and set as properties on those objects. That created a unique shape for each such GUID.

@woess can you have a look and that and assess whether this is a problem on our side, or in @yacota 's code?

Thanks,
Christian

@jordillachmrf
Copy link

Hi,
thanks @wirthi , I will create/provide a sample project but for the sake of time I can tell you that I am using NodeJVM https://github.com/mikehearn/nodejvm from @mikehearn

The execution path that I am testing is ALWAYS the same, the same html(600k) that it gets processed with jsdom

@woess , as said before I can share the heapdump if needed

Thanks

@mikehearn
Copy link
Contributor

Thanks for the bug report @jordillachmrf - couple of questions:

  1. Are you using the modified version of NodeJVM I proposed to you previously (w.r.t. multi-threaded Node)? If so can you post the code for it?
  2. Are you using the Kotlin support or plain Java? The Kotlin code in NodeJVM adds some more codepaths and type complexity on top.

@jordillachmrf
Copy link

Hi @mikehearn ,

  1. nope, I was just trying the POC in staging prior to creating a PR to NodeJVM and stumbled upon this problem. So I am using NodeJVM as it is, single threaded worker

  2. just Java, launching a minimalist spring boot app that receives the request and uses https://github.com/mikehearn/nodejvm/blob/master/src/main/java/net/plan99/nodejs/java/NodeJS.java#L166 to process some html payload with jsdom

@jordillachmrf
Copy link

Sample project https://github.com/Marfeel/graalvmjs-issue

@mikehearn
Copy link
Contributor

Just taking a quick look, when are you expecting it to GC? If you get rid of the futures and just use NodeJS.run (sync version) does it improve?

The program runs a large loop that runs a bunch of tasks asynchronously, but with a 5 second delay between each iteration (and NodeJS/NodeJVM doesn't do parallelism so it is effectively synchronous). It uses the async version and adds all the returned futures i.e. object graphs to a Java list, then does a CompletableFuture.allOf to join all the futures.

So by the time the loop is reaching the end there will be up to 1000 JS object graphs in memory. Additionally each loop runs require on the modules independently. Is it possible this results in every loop using different types and corresponding representation?

I'd suggest tweaking the benchmark a bit first to see if letting the GC reclaim results after each iteration helps.

@jordillachmrf
Copy link

jordillachmrf commented Apr 15, 2020

Hi @mikehearn , thanks for taking a look at it.

  • About concurrency, yes you are right and that's why I opened this Question : is it possible support multiple NodeJS main threads ? mikehearn/nodejvm#10 a while ago, so yes maybe the test it's too "hard"
  • For, when are you expecting it to GC? I am expecting it to deallocate memory every time an execution is finished, specially if I am forcing it to GC manually (with jvisualvm)
  • If I run it with NodeJS.run, thus only allocating in memory the stuff that will be immediately executed -> it ends up with more than 1 GB of DynamicObjectBasic by half of the execution
  • about each loop doing its own require ... well this project it's just a dummy demo trying to highlight an issue(maybe it's not an issue) but in real production scenario we're willing to load different js depending on request parameters, so we need to load them every time/per request

Screenshot 2020-04-16 at 00 26 43

dariol83 added a commit to dariol83/reatmetric that referenced this issue Aug 9, 2020
@dariol83
Copy link

dariol83 commented Aug 12, 2020

Hi everybody, I would like to report my experience, which seems pretty similar to the issue described here.

Context information: in my application (a monitoring and control framework), I use GraalVM JS to evaluate ca 1000 expressions, which combine values from some sampled values. The expressions are evaluated by 4 threads (clearly avoiding race conditions), but I also run tests with 1 thread only with the same results. Also, the expressions are evaluated several times per second (i.e. when a new sample of any contributing value is collected). Each expression has an associated and dedicated Engine. I was experiencing some high memory consumption up to OOM errors and, after investigation, I got the following results.

If I disable the expression processing, I have no memory issues.

If I enable the expression processing, using the code snipped below (creation of Engine, Source and Context from scratch every time), I have no memory issues but the performance are bad.
GraalVM-JS-New-Engine-Context

If I enable the expression processing, using the code snipped below (I create the Engine and the Source once per expression, and at every evaluation I use a different Context), then the heap explodes.
GraalVM-JS-Context

It seems that everything is hold up probably by the Engine or Source objects. I have the heapdump, from which I extracted the screens below.
GraalVM-heapdump

GraalVM-heapdump2

Perhaps I am doing something wrong, but I followed what the Graal VM JS documentation is reporting and the affected code seems quite simple to write and understand.

To conclude, I have this problem with versions 19.3.0, 19.3.3, 20.1.0.

I hope it helps to fix the issue.

Kind regards,
Dario

@mikehearn
Copy link
Contributor

I think the expensive object is the Context, not the Source. You're re-creating the context every single time and then returning an object out of it, which will pin the entire context in memory.

Try making Context cached as well and see what happens.

@dariol83
Copy link

Hi @mikehearn
thanks a lot for the quick reply. I run the several tests:

  1. I cached the Context object, without creating it every time: same behaviour (OOM), see below.
    GraalVM-JS-Context-Cached

GraalVM-heapdump3

  1. I cached the Value object used for the bindings, without creating it every time: same behaviour. I set the members of the bindings before the evaluation and remove them afterwards. I tried also to avoid removing them, but the result is the same.
    GraalVM-JS-Context-Cached-Bindings-Cached

GraalVM-heapdump5-context-cached-bindings-cached

  1. I avoided calling as(Object.class) on the returned Value from the evaluation.
    Same behaviour (I did not produce a VisualVM screenshot this time, my computer was dying due to memory usage).

GraalVM-JS-Context-Cached-Bindings-Cached-Value-Not-Used

  1. I actually tried to call setMember on the bindings only once at initialisation time, and avoid calling it at every re-evaluation. This leads clearly to wrong results (the binding values should be in fact updated), but I wanted to see if this usage pattern was the cause of the issue. I got the same behaviour unfortunately.

GraalVM-JS-Context-Cached-Bindings-Cached-And-Not-Updated

  1. As final attempt, I tried to not get the bindings out of the Context, to set them. This will clearly lead to an exception, which indeed is thrown, but still, the memory goes nuts till the OOM, with exactly the same behaviour observed for the previous cases.
    GraalVM-JS-Context-Cached-Bindings-Not-Used

GraalVM-heapdump5-context-cached-bindings-not used

All these tests and the related results make me think that each evaluation creates new objects that are finally linked to the engine and not properly GCed.

For your information, these are the types of expressions I typically evaluate (basically, a function declaration and a function call):

function eval() {
if(AQVP0004 == "PRESENT") return 0; else return 1;
}
eval();

function eval() {
return AQVP0009 + "_TEST";
}
eval();

Kind regards,
Dario

@nksaini83
Copy link

I am facing same problem@dariol83 , have you got any solution? , I cached the Engine and context
Context context = context from cache
Value result=null;
try{

			   Value func = context.getBindings("js").getMember(DEFAULT_RENDER_FUNCTION_NAME);
			 
			  result = func.execute( );
			 
            }catch (Exception e) {
            	
	}

@wirthi
Copy link
Member

wirthi commented Mar 4, 2022

@nksaini83 when you say "context from cache", does that mean you don't ever .close() the Context? That might easily explain why no memory is freed. Please make sure that the Context is closed.

@nksaini83
Copy link

@wirthi thanks for quick reply , for .close() i have to create context for every call , and it will give bad performance, that means we cannot cached the context .

@caoccao
Copy link

caoccao commented Mar 5, 2022

I wonder if you encapsulated your business logic in some functions. That would keep the number of global objects fixed (which is the number of the functions) so that GC could easily collect the orphan objects.

I'm not sure how GraalJS handles this approach, but am sure V8 is awesome in it.

@wirthi
Copy link
Member

wirthi commented Mar 7, 2022

@wirthi thanks for quick reply , for .close() i have to create context for every call , and it will give bad performance, that means we cannot cached the context .

Is that an assumption, or did you actually try that? Yes, creating a Context DOES cost some performance - but it is our strong believe that if you use Code Caching Across Multiple Contexts properly, the benefits by far outweigh the costs, as the Context creation will only be a tiny fraction of your overall runtime.

@nksaini83
Copy link

nksaini83 commented Mar 7, 2022

@wirthi thanks for quick reply , for .close() i have to create context for every call , and it will give bad performance, that means we cannot cached the context .

Is that an assumption, or did you actually try that? Yes, creating a Context DOES cost some performance - but it is our strong believe that if you use Code Caching Across Multiple Contexts properly, the benefits by far outweigh the costs, as the Context creation will only be a tiny fraction of your overall runtime.

Yes i did that in starting , that is why we cached context. Today i clear Engine after X iteration and every thing work fine.
image

I have also use Code Caching Across Multiple Contexts properly

private final Engine sharedEngine = Engine.newBuilder().build();

public Context create() throws Exception{

	Context context = Context.newBuilder("js")
		    .allowHostAccess(HostAccess.ALL)
		    .allowAllAccess(true)
		    .allowHostClassLookup(className -> true).engine(sharedEngine).build();
	
	return context; 
}

@wirthi pls let us know if any thing wrong in above code or what we can do solve this problem ?.

@nksaini83
Copy link

@wirthi Should we use this?
image

@wirthi
Copy link
Member

wirthi commented Mar 16, 2022

Hi,

no, unless you are on GraalVM 19. You should be on GraalVM 21.x., if not GraalVM 22.x. The comment was a workaround for GraalVM 19 specifically.

Today i clear Engine after X iteration and every thing work fine.

That sounds like a reasonable compromise. Note, however, that this way you might have user-observable exchange of data between contexts. So don't run multiple customers on that system. Only a fresh Context guarantees isolation of the global state of your application.

In addition, you use a very insecure setup (HostAccess.ALL, allowAllAccess, allow access to all classes) - that does not sound resonable for an application in production - particularly not if you execute untrusted JavaScript code.

Best,
Christian

@nksaini83
Copy link

nksaini83 commented Mar 17, 2022

Only a fresh Context guarantees isolation of the global state of your application.

As per doc we can use same context for multiple threads. And we ensure only one thread access same context at given time.
one more we are using 21.3.0 of Graal.
https://docs.oracle.com/en/graalvm/enterprise/21/sdk/org/graalvm/polyglot/Context.html
image

Thanks
Narendra Saini

@wirthi
Copy link
Member

wirthi commented Mar 17, 2022

Hi,

yes, that is possible. You need to make sure you correctly https://www.graalvm.org/truffle/javadoc/org/graalvm/polyglot/Context.html#enter--
and
https://www.graalvm.org/truffle/javadoc/org/graalvm/polyglot/Context.html#leave--
when switching threads.

-- Christian

@nksaini83
Copy link

nksaini83 commented Mar 17, 2022

Hi,

yes, that is possible. You need to make sure you correctly https://www.graalvm.org/truffle/javadoc/org/graalvm/polyglot/Context.html#enter-- and https://www.graalvm.org/truffle/javadoc/org/graalvm/polyglot/Context.html#leave-- when switching threads.

-- Christian

That means it will resolve memory related problem?

Narendra

@wirthi
Copy link
Member

wirthi commented Mar 17, 2022

Hi @nksaini83

I don't know anything on the details of the memory-related problem on your side. My comment was that if you never close a Context, as you stated in your first comment, then you most certainly will have garbage piling up in the Context instance(s).

Our suggested mode of operation is to use a fresh Context for each new query/customer/request/etc. If you prefer to cache Contexts, that might be ok, but of course, it can lead to data leak from one context to the other as I discussed above.

We sometimes find and fix memory leaks. We are not aware of any open memory leak at the moment, given you use the latest version of GraalVM and use a fresh Context every time (or at least, re-create them frequently enough).

To really help you with a specific problem: If you provide a runnable example of the problematic code that reproduces the issue, we are happy to look into it. Then we can run it on our machines and analyze the problem directly.

Best,
Christian

@nksaini83
Copy link

Hi @nksaini83

I don't know anything on the details of the memory-related problem on your side. My comment was that if you never close a Context, as you stated in your first comment, then you most certainly will have garbage piling up in the Context instance(s).

Our suggested mode of operation is to use a fresh Context for each new query/customer/request/etc. If you prefer to cache Contexts, that might be ok, but of course, it can lead to data leak from one context to the other as I discussed above.

We sometimes find and fix memory leaks. We are not aware of any open memory leak at the moment, given you use the latest version of GraalVM and use a fresh Context every time (or at least, re-create them frequently enough).

To really help you with a specific problem: If you provide a runnable example of the problematic code that reproduces the issue, we are happy to look into it. Then we can run it on our machines and analyze the problem directly.

Best, Christian

I tried both clearing context/Engine after X iteration in case of context nothing happened( memory not freed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working memory Memory footprint, memory leaks
Projects
None yet
Development

No branches or pull requests

8 participants