Skip to content

Add freeze/thaw and pickle capabilities #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

copelaje
Copy link

@copelaje copelaje commented Jun 9, 2025

NOTE: this builds off of already existing MR #39 which inlines prefix storage within a node and simplifies these updates. The change of this MR alone are much more localized.

This MR adds the ability freeze/thaw as well as pickle pyt objects. I tend to have rather large static pyt objects and ingesting fresh from sources at startup takes several minutes. If this data is stored in pickled format however startup can be nearly instantaneous.

The new freeze() method changes the underlying memory representation for the pyt object. Rather than having a node graph where each node is individually dynamically allocated, this will allocate one contiguous chunk of memory and store all elements within this chunk. When doing so it re-writes the linkages to be self consistent in the new location. The actual data pointers within each node remain unchanged as they are themselves dynamically allocated python objects. This reorganization has the end effect of making the pickle function easier since now one can bulk memcpy the entire nodes array and at restore can rewrite pointer linkage based on the memory address of the first node. It also provides a more compact representation of the structure allowing for better memory cache performance. The state of being frozen is stored in a flag within top level tree type. This is important because in this frozen/compact representation we'll disallow any modification to the pyt contents (insert/delete/etc.).

The new thaw() method does the exact reverse function of the freeze() method and will restore the per-node dynamic allocation. In this form things are less compact, but can be modified much more efficiently.

I also implemented the __reduce__ method (for pickle support) and the __setstate__ method (for unpickle). If pickling is attempted on a non-frozen object the user is given warning that they must thaw first. When an object is unpickled it will be restored in the frozen state. If a user intents to modify they can invoke thaw() and then do so.

While one could automatically freeze/thaw during the pickle process I've intentionally chosen to maintain these functions separately. In my primary use case I would never have need to thaw() and thus forcing it to be so creates extra computation, and loses memory cache benefit. I suspect use of pickling of these objects is not a common use case thus a more advanced user can invoke these extra methods as needed.

Several tests were added to verify proper operation and the README.md was also updated to document and demonstrate the new capabilities.

@copelaje copelaje mentioned this pull request Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant