Skip to content

Optimize memory usage #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 27, 2024
Merged

Optimize memory usage #6

merged 9 commits into from
Jun 27, 2024

Conversation

cre-os
Copy link
Collaborator

@cre-os cre-os commented Jun 5, 2024

Currently, XML file is parsed in memory, then document tree is extracted, then hash are computed, then the data is extracted into a flat data model. This PR proposes to switch to iterative event based parsing as a default in order to optimize memory use, while keeping also the original approach, which can be faster even if it uses more memory.

  • parsing is done in an iterative fashion
  • document tree is built during the parsing
  • hash are computed right after a node has been parsed
  • deduplication based on hash is done as it goes

@cre-os cre-os merged commit ec5ef05 into main Jun 27, 2024
9 checks passed
@cre-os cre-os deleted the feature/switch_to_lxml_iterparse branch June 27, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant