You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make better use of available cpu on larger VMs (#57)
These changes allow the loader to better utilize all cpu available on a
larger instance.
**1. CPU-intensive parsing/transforming is now parallelized**.
Parallelism is configured by a new config parameter
`cpuParallelismFraction`. The actual parallelism is chosen dynamically
based on the number of available CPU, so the default value should be
appropriate for all sized VMs.
**2. We now open a new Snowflake ingest client per channel**. Note the
Snowflake SDK recommends to re-use a single Client per VM and open
multiple Channels on the same Client. So here we are going against the
recommendations. But, we justify it because it gives the loader better
visiblity of when the client's Future completes, signifying a complete
write to Snowflake.
**3. Upload parallelism chosen dynamically**. Larger VMs benefit from
higher upload parallelism, in order to keep up with the faster rate of
batches produced by the cpu-intensive tasks. Parallelsim is configured
by a new parameter `uploadParallelismFactor`, which gets multiplied by
the number of available CPU. The default value should be appropriate for
all sized VMs.
These new settings have been tested on pods ranging from 0.6 to 8
available CPU.
0 commit comments