Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialization cap question #1244

Open
misodle opened this issue Nov 1, 2023 · 1 comment
Open

materialization cap question #1244

misodle opened this issue Nov 1, 2023 · 1 comment
Labels

Comments

@misodle
Copy link

misodle commented Nov 1, 2023

This code:
let $result :=
{
"gtinList":
for $doc in json-file("LinesFile.json")
for $row in $doc
for $i in $row.items[]
return
{
"gtin" : $i.item.gtin
}
}
return $result

Gives this message:
Code: [RBDY0005]
Message: Cannot materialize a sequence of 2000 items because the limit is set to 1000. This value can be configured with the --materialization-cap parameter at startup

We can always increase the size of the materialization-cap value, but this might get prohibitively large at some point. This is a fil with the lines format (json per line), and each line can have up to 1000 nested values in it. With only 2 lines we hit 2000 here.

In the documentation there is this explanation for the error code.

[RBDY0005] - Materialization Error: the sequence is too big to be materialized. Use --materialization-cap to increase the maximum materialization size, or add an output path to write to.

Is there some way to specify an output path here so we can not have to worry about this materialization-cap?

@ghislainfourny
Copy link
Member

ghislainfourny commented Sep 23, 2024

Thank you @misodle for your message.

The issue has to do with the fact that the large sequence is nested inside an object. Generally, in the NoSQL paradigm, objects should not grow too big, because this does not scale.

The way to achieve scalability is to make sure that large sequences do not get nested.

In this example, this would be the way to execute the query, and save it across many different output files, in a way that scales:

for $row in json-file("LinesFile.json")
for $i in $row.items[]
return
{
  "gtin" : $i.item.gtin
}

(Note that the second for is redundant, as $doc is a single item already)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants