-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support file_size_bytes option #100
base: main
Are you sure you want to change the base?
Conversation
974c773
to
db443bb
Compare
9d90243
to
a493016
Compare
db443bb
to
3bd4f41
Compare
6428599
to
a857cb6
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #100 +/- ##
==========================================
+ Coverage 91.88% 92.45% +0.56%
==========================================
Files 77 78 +1
Lines 10320 10640 +320
==========================================
+ Hits 9483 9837 +354
+ Misses 837 803 -34 ☔ View full report in Codecov by Sentry. |
@@ -1194,8 +1194,6 @@ mod tests { | |||
results | |||
}); | |||
|
|||
Spi::run("TRUNCATE dog_owners;").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix a wrong flow that revealed by the PR
COPY TO parquet now supports a new option, called `file_size_bytes`, which lets you generate parquet files with target size = `file_size_bytes`. When a parquet file exceeds the target size, it will be flushed and a new parquet file will be generated under a parent directory. (parent directory will be the path without the parquet extension) e.g. ```sql COPY (select 'hellooooo' || i from generate_series(1, 1000000) i) to '/tmp/test.parquet' with (file_size_bytes 1048576); ``` ```bash > ls -alh /tmp/test/ 1.4M data_0.parquet 1.4M data_1.parquet 1.4M data_2.parquet 1.4M data_3.parquet 114K data_4.parquet ```
a857cb6
to
2c06724
Compare
COPY TO parquet now supports a new option, called
file_size_bytes
, which lets you generate parquet files with target size =file_size_bytes
.When a parquet file exceeds the target size, it will be flushed and a new parquet file will be generated under a parent directory. (parent directory will be the path without the parquet extension)
e.g.
> ls -alh /tmp/test/ 1.4M data_0.parquet 1.4M data_1.parquet 1.4M data_2.parquet 1.4M data_3.parquet 114K data_4.parquet