Skip to content

Zstd compression with dictionnary based on schema #15348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
billouboq opened this issue Apr 9, 2025 · 2 comments
Open
2 tasks done

Zstd compression with dictionnary based on schema #15348

billouboq opened this issue Apr 9, 2025 · 2 comments
Labels
enhancement This issue is a user-facing general improvement that doesn't fix a bug or add a new feature new feature This change adds new functionality, like a new method or class

Comments

@billouboq
Copy link
Contributor

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the feature has not already been requested

🚀 Feature Proposal

Hello !

I was just quickly thinking, would it be possible to create a zstd dictionnary based on document schema to make it wayyyyy faster to compress/decompress ?

Motivation

Increase performances

Example

No response

@billouboq billouboq added enhancement This issue is a user-facing general improvement that doesn't fix a bug or add a new feature new feature This change adds new functionality, like a new method or class labels Apr 9, 2025
@vkarpov15
Copy link
Collaborator

I took a look and, while this is an interesting idea, I don't think Mongoose can support this right now because the MongoDB Node driver uses @mongodb-js/zstd custom zstd implementation, which doesn't support dictionary compression. Current API is just compress(data, compressionLevel) and decompress(data), no dictionary support. Do you have any ideas to work around this @billouboq ?

@baileympearson
Copy link
Contributor

Drivers have considered dictionary support in the past but decided not to implement this feature (https://jira.mongodb.org/browse/DRIVERS-2396). This change would require server changes to support the dictionary used for compression server-side (the server + client must share the same dictionary used for compression), and that breaks the stateless behavior of existing client + server compression.

Also, open to suggestions about what it might look like to create a dictionary based on a schema, but all the underlying zstd APIs to create dictionaries train the dictionary from sample documents. I'm not sure what that would look like in Mongoose - would example documents be generated from the schema, serialized to bytes and then fed into the trainer? Or something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue is a user-facing general improvement that doesn't fix a bug or add a new feature new feature This change adds new functionality, like a new method or class
Projects
None yet
Development

No branches or pull requests

3 participants