Skip to content

Incomplete blob and writes cleanup for shallow classes #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
noelpenne opened this issue May 7, 2025 · 0 comments
Open

Incomplete blob and writes cleanup for shallow classes #39

noelpenne opened this issue May 7, 2025 · 0 comments

Comments

@noelpenne
Copy link

Bug Description

According to the documentation, the ShallowRedisSaver and AsyncShallowRedisSaver are intended to retain only the latest complete checkpoint, including associated blobs and write entries. However, this behavior does not hold due to two related issues:

  1. Key Formatting Inconsistency:
    The shallow saver classes do not use the same key formatting utilities (to_storage_safe_id, to_storage_safe_str) as the base classes, especially when handling empty checkpoint_ns values. As a result, when searching for outdated keys, orphaned checkpoint_blob and checkpoint_write entries are left undeleted because the key patterns do not match what was written (e.g., empty strings are not encoded to __empty__).

  2. Unsafe Channel Value Parsing:
    Current logic does not encode channel values. If a channel value contains the delimiter character :, the split/parse logic (see shallow.py#L182-L186) misinterprets keys, causing it to miss the correct entries for cleanup.

Minimal Example

Given:

thread_id = "1234"
checkpoint_ns = ""
channel_values = {"messages": [], "branch:to:agent": None}

The following key would be produced:

checkpoint_blob:1234:__empty__:branch:to:agent:00000000000000000000000000000007.0.35094819908709574

Current behavior:

  1. The key pattern generated by _make_shallow_redis_checkpoint_blob_key_pattern (ref) misses this blob because it doesn't encode empty strings the same way as _dump_blobs, so old blobs aren't deleted.
  2. Even with key pattern issue fixed, channel values containing : cause parsing errors. E.g., the parsing would incorrectly interpret channel="branch" and version="to" instead of the intended values.

Suggested Solutions

  • Consistently apply encoding (e.g., base64) to all interpolated values (including thread_id, checkpoint_ns, and channel) to avoid delimiter collisions.
  • Ensure that key formatting (including handling empty strings) is unified across both base and shallow classes to avoid orphaned entries. An initial draft PR fix: cleanup blobs and writes for shallow classes #37 for this was created (not addressing the delimiter issue though)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant