Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Support for new Module API - ValkeyModule_ShouldDefrag #1210

Open
KarthikSubbarao opened this issue Oct 22, 2024 · 17 comments
Open

[NEW] Support for new Module API - ValkeyModule_ShouldDefrag #1210

KarthikSubbarao opened this issue Oct 22, 2024 · 17 comments

Comments

@KarthikSubbarao
Copy link
Member

KarthikSubbarao commented Oct 22, 2024

The problem/use-case that the feature addresses

Currently, Modules do not have a mechanism to know whether an object pointer "should" be defragmented. We have access to the ValkeyModule_DefragAlloc which performs the check. But in addition to the check, it also moves the pointer allocation to a new allocation and returns the new pointer.

In cases Modules have their own custom defragmentation logic, it will help to expose an API that only returns the result of the je_get_defrag_hint call on the provided pointer.

Description of the feature

A new Module API, ValkeyModule_ShouldDefrag, can be implemented to accept a void pointer and return the result of the je_get_defrag_hint call. Based on this, Modules can handle the defragmentation operation accordingly.

Alternatives you've considered

Two options:

  1. Don't defrag at all
  2. Always re-create / re-allocate the object

Additional information

NA

@KarthikSubbarao
Copy link
Member Author

I can raise a PR for this issue based on thoughts on issue and approach

@zuiderkwast
Copy link
Contributor

Do you have a module that needs this? (Which one?)

Can you explain the module logic a bit more? In which situation do you need it and what do you want to do instead of actually DefragAlloc in this case?

@KarthikSubbarao
Copy link
Member Author

KarthikSubbarao commented Oct 23, 2024

Do you have a module that needs this? (Which one?)

We would want to use this from the ValkeyBloom Module. This Module implements a data type whose structure internally uses a raw bloom implementation from an external Rust crate / library.
If we were to use the DefragAlloc API, a new memory block would be allocated and a new pointer is returned. Updating the internal pointers, may not be easy to do since it is an external library and since this in Rust.

Because of this, we planned to use this new Module API ValkeyModule_ShouldDefrag to delete and re-create the particular BloomFilter within the Bloom object only if necessary based on the return code of the API

@zuiderkwast
Copy link
Contributor

Thanks! Excellent use case.

@zuiderkwast
Copy link
Contributor

Alternative: Always re-create it? Or not defrag at all?

Rust external crates don't use jemalloc, do they?

When we defrag in valkey, we want to move all the allocations from a fragmented jemalloc area to a fresh area (bin? arena? not sure about the terminology). How do we make sure we get allocations from the fresh bins in Rust?

If they're in separate memory, not mixed with the fragmented memory, then maybe it's fine to skip defragmenting these..?

@zvi-code I think you know this area.

@KarthikSubbarao
Copy link
Member Author

Alternative: Always re-create it? Or not defrag at all?

Thanks - I added this to the alternatives.

Rust external crates don't use jemalloc, do they?

When developed as a Module, we can choose to override the global allocator and use Valkey core as the allocator (as opposed to using the System default allocator). The ValkeyBloom Module does this. Every allocation and free goes through Valkey

@zvi-code
Copy link

we planned to use this new Module API ValkeyModule_ShouldDefrag to delete and re-create the particular BloomFilter within the Bloom object only if necessary based on the return code of the API

@KarthikSubbarao, The problem with this approach is that it is very likely you will reallocate the exact same pointer you just freed, because the memory will be freed to tcache and so will the next allocation. The way the defrag avoids that is that it uses API to explicitly tell the allocator to skip the tcache when alloc&free for defrag.

@zvi-code
Copy link

zvi-code commented Oct 28, 2024

Rust external crates don't use jemalloc, do they?

I think jemalloc is the default allocator in rust and anyway if i recall correctly, unless you do something explicitly, all calls to alloc/free will go to jemalloc. I am assuming the explicit usage of zmalloc by a module is to maintain the memory usage stats correct

@KarthikSubbarao
Copy link
Member Author

Thank you. We are trying to implement an efficient defrag system for Modules which are using external libraries for their data types.

Mainly we want to avoid realloc / free over multiple defrag cycles when it is not needed.

@zvi-code Is there any alternative you would recommend for such Module data types?

The problem with this approach is that it is very likely you will reallocate the exact same pointer you just freed, because the memory will be freed to tcache and so will the next allocation.

Yes, during activeDefragAlloc, valkey uses zmalloc_no_tcache to not use thread cache to ensure that we do not get back the same pointer. I am not sure how likely it is for that to happen when using thread cache, but having that guarantee is important since it is followed by a free.

@zvi-code
Copy link

When are you initiating the defrag? is this as part of the defrag key callback of the module (during the defrag cron)? If so, we might consider disabling the tcache for the entire defrag time (for main thread only) or provide API for the module to disable it and re-enable tcache

@KarthikSubbarao
Copy link
Member Author

is this as part of the defrag key callback of the module (during the defrag cron)?

Yes, it is only from the Defrag callback of the Module data type.

@zvi-code
Copy link

Another option, I'm considering for defrag in general and could help here is to use the user defined tcache (jemalloc supports the ability to create tcache and than perform free\alloc from those tcache's, see here). The thinking is that we would create a "free_while_defrag" tcache.
They way it will be used is that during the defrag cron running period, any zfree will be directed to that tcache and thus it will be guarantee we don't re-allocate just freed memory (unless it was allocated from the arena).
This would solve this issue IFF the module is using zmalloc api for allocations (if it just uses the malloc, than it would not help :( )

@zuiderkwast
Copy link
Contributor

@zvi-code This sounds great! It is possible to override malloc and free, like this: https://stackoverflow.com/questions/262439/create-a-wrapper-function-for-malloc-and-free-in-c#262481

Do you think this idea can be used and would be possible in Rust too, to redirect the allocation functions to zmalloc?

@zvi-code
Copy link

Yea, I think its possible

@zvi-code
Copy link

zvi-code commented Nov 16, 2024

@zuiderkwast I just took another look, to my understanding, if not explicitly set otherwise, than rust module are already calling the zmalloc api for alloc\free as implemented here.

For some reason, ValkeyBloom module explicitly uses the system-allocator and not the ValkeyAlloc. I opened an issue to clarify\ask if it's intentional.

@KarthikSubbarao
Copy link
Member Author

KarthikSubbarao commented Nov 16, 2024

Hi Zvi, we are using ValkeyAlloc by default which goes through ValkeyModule_Malloc and ValkeyModule_Free, etc.

I replied to your question on the issue as well. We do not use system alloc for the release build. It is only used in unit testing where the valkey server is not running

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Nov 17, 2024

So, during defrag, you recreate the bloom filter and then we want to make sure that the crate doesn't allocate memory from the same tcache again. That's the problem, right?

I can see two possibilities:

  • Set some flag that causes ValkeyModule_Alloc to behave like ValkeyModule_DefragAlloc when defrag is running.
  • Set some flag in the Rust allocator ValkeyAlloc that makes it use ValkeyModule_DefragAlloc instead of ValkeyModule_Alloc when defrag is running.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants