Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-20917: Support applyQuotesToAll option in OpenCSVSerde. #5513

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gigem
Copy link

@gigem gigem commented Oct 17, 2024

This is a revised version of #3718 and includes only the applyQuotesToAll changes. The other, OpenCSV changes were committed separately.

OpenCSV version 3.10 added an option called "applyQuotesToAll." When set to false, output columns were not quoted unless they needed it. This commit adds support for that option in OpenCSVSerde. This commit is based on a patch from Sungwoo Park [email protected], which was originally based on a patch from David Engel [email protected].

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

OpenCSV version 3.10 added an option called "applyQuotesToAll."  When
set to false, output columns were not quoted unless they needed it.
This commit adds support for that option in OpenCSVSerde.  This commit
is based on a patch from Sungwoo Park <[email protected]>, which was
originally based on a patch from David Engel <[email protected]>.
@deniskuzZ
Copy link
Member

hi @gigem, in that case, could we close the #3718?

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor things

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1, pending tests

@gigem
Copy link
Author

gigem commented Oct 24, 2024

@gigem, crtseltbl_serdeprops.q was affected by this change: https://ci.hive.apache.org/job/hive-precommit/job/PR-5513/4/testReport/org.apache.hadoop.hive.cli.split5/TestMiniLlapLocalCliDriver/Testing___split_21___PostProcess___testCliDriver_crtseltbl_serdeprops_/

This link returns a 404 for me. Do you have another one? In the meantime, I'll see if I can recreate the error locally.

@deniskuzZ
Copy link
Member

@gigem, crtseltbl_serdeprops.q was affected by this change: https://ci.hive.apache.org/job/hive-precommit/job/PR-5513/4/testReport/org.apache.hadoop.hive.cli.split5/TestMiniLlapLocalCliDriver/Testing___split_21___PostProcess___testCliDriver_crtseltbl_serdeprops_/

This link returns a 404 for me. Do you have another one? In the meantime, I'll see if I can recreate the error locally.

Client Execution succeeded but contained differences (error code = 1) after executing crtseltbl_serdeprops.q 
29c29
< 100|val_100
---
> "100"|"val_100"

@gigem
Copy link
Author

gigem commented Oct 28, 2024

Client Execution succeeded but contained differences (error code = 1) after executing crtseltbl_serdeprops.q
29c29
< 100|val_100

"100"|"val_100"

It would seem Boolean.parseBoolean() is not parsing the empty/default property correctly. Unfortunately, I'm not able to reproduce the error (and test any fix) because earlier tests fail in my build. I'm a near total, Maven neophyte so could someone please tell me the magic encanation to run just the OpenCSVSerde (or Serde) tests?

@deniskuzZ
Copy link
Member

deniskuzZ commented Oct 28, 2024

Client Execution succeeded but contained differences (error code = 1) after executing crtseltbl_serdeprops.q

29c29
< 100|val_100

"100"|"val_100"

It would seem Boolean.parseBoolean() is not parsing the empty/default property correctly. Unfortunately, I'm not able to reproduce the error (and test any fix) because earlier tests fail in my build. I'm a near total, Maven neophyte so could someone please tell me the magic encanation to run just the OpenCSVSerde (or Serde) tests?

WDYM, there is no issue with parseBoolean. Aren't you changing the defaults to not quote the values?

  public static boolean parseBoolean(String s) {
        return ((s != null) && s.equalsIgnoreCase("true"));
    }

to run the test:

mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=crtseltbl_serdeprops.q -Drat.skip=true

before change

"100"|"val_100"

after change quotes are removed

100|val_100

@gigem, shouldn't we apply the quotes by default if quoteChar is provided??

@gigem
Copy link
Author

gigem commented Oct 28, 2024

before change

"100"|"val_100"

after change quotes are removed

100|val_100

Ah, I see you mean now. It was not my intent to change any original defaults. However, this change was made a long time ago for our private use. It was only ressurrected somewhat recently for our use with Hive/MR3 and I assumed that when working with them we'd kept any old behavior unchanged. I'll fix this as soon as I'm able to test it properly. I have more pressing, immediate needs to attend to first.

@gigem, shouldn't we apply the quotes by default if quoteChar is provided??

I don't think so. The desire to quote or not quote all columns can be independtent of the actual, quote character.

@deniskuzZ
Copy link
Member

before change

"100"|"val_100"

after change quotes are removed

100|val_100

Ah, I see you mean now. It was not my intent to change any original defaults. However, this change was made a long time ago for our private use. It was only ressurrected somewhat recently for our use with Hive/MR3 and I assumed that when working with them we'd kept any old behavior unchanged. I'll fix this as soon as I'm able to test it properly. I have more pressing, immediate needs to attend to first.

@gigem, shouldn't we apply the quotes by default if quoteChar is provided??

I don't think so. The desire to quote or not quote all columns can be independtent of the actual, quote character.

in that case, you could simply add applyQuotesToAll='true' in separator_test table serde properties

@gigem
Copy link
Author

gigem commented Oct 28, 2024

in that case, you could simply add applyQuotesToAll='true' in separator_test table serde properties

I think that would change the default behavior for unsuspecting users.

After a little bit of research, I believe the following should fix both the test and default behavior.

applyQuotesToAll = Boolean.parseBoolean(properties.getProperty(APPLYQUOTESTOALL, "true"));

My attempts to test it keep failing, though, due to unrelated build issues and I'm reluctant to commit it until I'm sure it's finally correct.

@deniskuzZ
Copy link
Member

in that case, you could simply add applyQuotesToAll='true' in separator_test table serde properties

I think that would change the default behavior for unsuspecting users.

After a little bit of research, I believe the following should fix both the test and default behavior.

applyQuotesToAll = Boolean.parseBoolean(properties.getProperty(APPLYQUOTESTOALL, "true"));

My attempts to test it keep failing, though, due to unrelated build issues and I'm reluctant to commit it until I'm sure it's finally correct.

the above change makes sense to me.

This fixes the test regression and maintains compatibility with
previous versions before applyQuotesToAll support was added.
Copy link

sonarcloud bot commented Nov 6, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants