Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Add a new option to parquet reader to treat binaries as UTF-8 text #9112

Closed
wants to merge 2 commits into from

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Jul 27, 2022

This PR adds a new property called treatBinaryAsString to ParquetRecordReaderConfig. This property is an optional boolean. If it is true, then the import process will treat all binary columns as texts.

See #9081

@gortiz gortiz changed the title Add a new option to parquet reader to treat binaries as UTF-8 text [DRAFT] Add a new option to parquet reader to treat binaries as UTF-8 text Jul 27, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2022

Codecov Report

Merging #9112 (5c4b45b) into master (f473bf6) will decrease coverage by 2.86%.
The diff coverage is n/a.

❗ Current head 5c4b45b differs from pull request most recent head 166b3d8. Consider uploading reports for the commit 166b3d8 to get more accurate results

@@             Coverage Diff              @@
##             master    #9112      +/-   ##
============================================
- Coverage     69.96%   67.09%   -2.87%     
- Complexity     4764     4815      +51     
============================================
  Files          1859     1381     -478     
  Lines         99118    71945   -27173     
  Branches      15074    11519    -3555     
============================================
- Hits          69346    48273   -21073     
+ Misses        24861    20167    -4694     
+ Partials       4911     3505    -1406     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.09% <ø> (+0.01%) ⬆️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/SegmentReloadMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/TableDeletionMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/core/data/manager/realtime/TimerService.java 0.00% <0.00%> (-100.00%) ⬇️
... and 714 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Jackie-Jiang
Copy link
Contributor

Should we consider adding the charSetName (default to utf8) in case the string is using a different encoding?

@gortiz gortiz closed this Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants