SNOW-1703685: High Memory Usage during PUT Query execution for Large GZIP compressed CSV files #922
Labels
enhancement
The issue is a request for improvement or a new feature
status-triage_done
Initial triage done, will be further handled by the driver team
Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!
What version of NodeJS driver are you using?
1.9.3
What operating system and processor architecture are you using?
MacOS arm64
What version of NodeJS are you using?
(
node --version
andnpm --version
)node : 18.12.1 , npm: 8.19.2
What are the component versions in the environment (
npm list
)?NA
Server version:
8.9.1
What did you do?
Issue Summary
While executing a PUT query to stage a large, compressed CSV file from the local file system to a Snowflake stage (S3), the memory usage of the snowflake-sdk grows significantly, especially with large files. During the execution, the Snowflake SDK performs several operations:
While these steps are necessary, the SDK's memory footprint grows significantly based on the file size, which appears to be due to the following reasons:
Digest Calculation:
The SDK calculates the SHA-256 digest of the file by reading the entire file into memory (Ref code).
For large files, this leads to high memory consumption, which can cause memory-related issues or crashes.
Suggestion: Instead of loading the entire file into memory, the hash can be calculated incrementally for each chunk of data as it is read. This is possible by updating the hash during the streaming process, reducing the memory footprint. (Crypto module Ref -
This can be called many times with new data as it is streamed.
)File Upload:
readFileSync
), which again leads to excessive memory consumption for large files. [Ref Code - S3, GCS, Azure]Steps to Reproduce:
While executing the query, monitor memory usage using tools like: Node.js process memory logging, clinic doctor or any external memory profiling tool.
Can you set logging to DEBUG and collect the logs?
No
What is your Snowflake account identifier, if any? (Optional)
The text was updated successfully, but these errors were encountered: