-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support Large Dictionary for OrcWriter
OrcWriter uses dictionary encoding for all columns until the writer's total dictionary memory exceeds the dictionaryMaxMemory - 4MB. Then starts abandoning the dictionary encodings. When running with large dictionary sizes (say 80 MB), and using long dictionary, the dictionary writer could retain 100's of MB before it will be abandoned. This change introduces new configuration parameters to control this behavior. 1. Make the 4 MB threshold when dictionary is almost full configurable. Large dictionary can configure this to something bigger. 2. When a dictionary column exceeds a certain dictionary size, measure if dictionary is effective and abandon it if it is not. 3. The setting 2 could affect existing writers, so introduce a 3rd setting on how often to do the dictionary effectiveness check. It is configured to INT_MAX to preserve existing behavior.
- Loading branch information
Arunachalam Thirupathi
committed
Nov 10, 2021
1 parent
fe936d0
commit 699832f
Showing
6 changed files
with
348 additions
and
57 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.