You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When writing to binary, a small chunk size is currently used. In general I think the less chunks the better, to avoid edge effects. A such it might be a good idea to a large possible chunk size that memory will allow (e.g. say use 70% of available memory)
The text was updated successfully, but these errors were encountered:
Scaling the memory was not as simple as possible. First, SI's memory tracking is tagged to undergo so improvement and is not trivial to implement, so getting a good estimate of memory used during pre-processing is not easy.
Nonetheless, a rough guess could be used, taking the maximum itemsize (for float 64 used in some preprocessing steps) and a 2-3 times multiplier based on dev feedback.
See #108 for a first implementation. The current blocker on this was finding the available memory across different settings.
psutil.virtual_memory().available did not give accurate memory on SLURM nodes, giving a much higher value than requested (e.g. request 40GB it is showing 380 GB
slurmio always returned 16GB even if 40 GB was requested.
e.g.
Once this is resolved it would be possible to expose a argument 'fixed_batch_size' that allows the user to fix a batch size. Otherwise, use 70% or so of available memory.
When writing to binary, a small chunk size is currently used. In general I think the less chunks the better, to avoid edge effects. A such it might be a good idea to a large possible chunk size that memory will allow (e.g. say use 70% of available memory)
The text was updated successfully, but these errors were encountered: