-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1558347: Fix PARSE_HEADER bug in session.read.csv #2028
Conversation
} | ||
new_options = {**file_format_options} | ||
file_format = self.session.sql( | ||
f"DESCRIBE FILE FORMAT {options['FORMAT_NAME']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be careful about eagerly triggering such a query 1) it will be visible to users 2) maybe add a try/catch because it's still possible that it will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I think I can do this better in a way that maintains lazy evaluation. I'll refactor a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a jira and add a todo here for doing it lazily in the future?
Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing |
1 similar comment
Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing |
1ee6a80
to
ef02131
Compare
ef02131
to
1032856
Compare
@sfc-gh-jrose are there any changes for Snowpark pandas reviewers to review here? |
No. I merged main incorrectly earlier which pulled in a snowpark pandas change. I've since removed that commit, but git still pulled in the group automatically. |
Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing |
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1558347
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Previously if a user specified
PARSE_HEADER = True
in an externally defined file format it would not be able to be used withsession.read.csv
without throwing an exception. This exception was caused due to how we infer csv schemas in the client. The client separates out csv loading into a schema inference step and a query step. Parsing the header is valid in the inference step, but should be skipped in the query step.This change modifies the logic to the read step so that a temporary file format is always created. The external file format is queried and merged with the local options giving the local options precedence. The local options are then modified as needed such as for the parse header issue above.
I've added a test that provides a scenario that would have previously failed.
These parts may be contentious:
create_file_format_statement
is changed to only set the format if it wasn't specified in the options_merge_file_format_options
does some parsing of the options itself. I didn't see any other logic that currently handles that format. It should work for all currently existing formats, but it could break for future options if their formatting is significantly different.read_file
prefers the client specified options over the file_format options whereas previously the client format options would have been ignored. This is trivial to change, but I think the new style should make more sense to the user.