Skip to content

Conversation

@detecti1
Copy link

Summary

Replace @lru_cache on instance methods with instance-level caching to fix memory leak. The global LRU cache keys by self, retaining parser instances and their DataFrames indefinitely.

What's Changed

  • BaseDataFrameDataParser: Add _field_metas_cache, _raw_fields_cache, _cache_lock for thread-safe instance-level caching with double-checked locking.
  • SparkDataFrameDataParser / DatabaseDataParser / CloudDatasetParser: Apply the same pattern.
  • Keep @lru_cache only for pure function get_timezone_base_offset.

Behavior Compatibility

  1. Public API unchanged (same properties and return structures).
  2. Semantics preserved: once-per-instance computation, reuse for subsequent accesses.
  3. @lru_cache automatically adds cache_info() and cache_clear() methods to decorated functions. This project does not use these methods internally, but external code that explicitly calls them will raise AttributeError after this change (e.g., BaseDataFrameDataParser.field_metas.fget.cache_clear()).

Breaking Changes

None for normal usage. Only affects code introspecting or manipulating the LRU cache internals.

Related Issues

Fixes #723

@ObservedObserver
Copy link
Member

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Memory Leak in Data Parsers: Instance Methods Decorated with @lru_cache Retain Large DataFrames

2 participants