Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Possible memory leak in loop over array containing maps #5375

Open
nverwer opened this issue Jul 22, 2024 · 2 comments
Open

[BUG] Possible memory leak in loop over array containing maps #5375

nverwer opened this issue Jul 22, 2024 · 2 comments

Comments

@nverwer
Copy link

nverwer commented Jul 22, 2024

Description

When extracting data from a large array that contains maps, heap memory in Java keeps growing. This might be caused by a memory leak.
When running the same script with the same data, garbage collection appears to retrieve the used memory. However, each time the code or the data changes, the heap memory usage increases and does not return to the previous level.

The following graph was generated using real data (https://zenodo.org/records/10482057) and comes from jconsole:
image

Using generated data (see below), the graph is similar:
image

Expected behaviour

The heap memory usage should return to a lower level after garbage collection. It should not increase permanently after a change in code or data.

To reproduce

The following script is a much simplified version of a script that gets data out of a (large) array containing maps.
In the original script, the data comes from a JSON file, but I get the same results when generating the data in the script:

let $doc as item() := array
  { for $i in 1 to 500000
    return map
      { 'id' : 'id'||$i
      , 'status' : if ($i mod 100 = 0) then 'inactive' else if ($i mod 80 = 1) then 'withdrawn' else 'active'
      , 'relationships' : array{ map
        { 'type' : 'Related'
        , 'id' : 'id'||($i+1)
        }}
      }
  }
let $doc-size as xs:integer := array:size($doc)

let $ids :=
  for $doc-index in 1 to $doc-size
    let $item as map(*) := $doc($doc-index)
    (:let $status := $item?status:)
    (:let $relationships as array(*)? := $item?relationships:)
    where $item?status = ('withdrawn','inactive') and exists($item?relationships)
  return $item?id

return count($ids)

At first, I thought that the memory leak (if that is what this is) was in the loop variables $status and $relationships, but that seems not to be the case, so I commented them out.

The second graph above was generated by running this script a few times, than change 500000 in for $i in 1 to 500000 into 500001, run a few times, change to 500002, run a few times, etcetera.

Context

eXist-db: eXist-6.2.0
JVM: OpenJDK 64-Bit Server VM version 11.0.14.1+1
OS: WIndows 10
eXist is run with the launcher (not as a service, although that appears to have the same problem), with memory.max=8192.

More details

I used VisualVM to analyze a heap dump, to get an idea of what takes up all the space in the heap. This suggests that there is a lot in the cache. However, cache:clear() does not change the used heap space.

image

image

I am not sure if this gives an indication of what is going on.

@adamretter
Copy link
Contributor

adamretter commented Jul 22, 2024

@nverwer The cache that your traces are showing is that of compiled XQuery Modules (and not the Cache XQuery Extension Module that is available via the cache:* functions). When eXist-db compiles a Module and executes it, as compilation is time intensive, after execution, it resets (clears) the state of the Module and stores it into a Caffeine Cache. The next time the same query is executed, instead of recompiling it, it is borrowed from the cache.

It looks like the reset of the module is perhaps not resetting some expressions that accumulated state. We have seen this several times in the past for complex expressions. I did fix a number of issues previously with Maps and Arrays in this area. Could you check if I already fixed this in main by building a 7.0.0-SNAPSHOT? If not, it is possibly another bug in this area that needs to be addressed.

@nverwer
Copy link
Author

nverwer commented Jul 24, 2024

@adamretter Thank you for your response. I compiled the latest 7.0.0-SNAPSHOT and ran the script as shown above.
Unfortunately, heap space usage keeps increasing as I change 500000 into 500001, 500002, and so on.

image

It looks like this problem is still there. Although I am beginning to understand some of the Java code for eXist, I am afraid I cannot be of much help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants