-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Description:
We previously attempted using del and gc.collect() to manage memory usage during large file uploads, but the issue persisted. This indicates that the problem may be related to lingering references to DataFrames, possibly within the df_to_redshift_spectrum function. The issue was primarily observed with very large files, while smaller file batches (a few GBs) loaded successfully without problems. There is also a risk that the same issue will reoccur in the future if we need to reload the large file. The current hypothesis is that repeated chunk rotations are filling up memory due to references not being properly released - either within Pandas itself or in our own implementation.
Goal
Identify the root cause of memory leaks during chunked DataFrame processing and implement fixes to ensure stable handling of large file uploads.