Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase chunk size based on memory #104

Closed
JoeZiminski opened this issue Sep 11, 2023 · 2 comments · Fixed by #167
Closed

Increase chunk size based on memory #104

JoeZiminski opened this issue Sep 11, 2023 · 2 comments · Fixed by #167
Labels
enhancement New feature or request
Milestone

Comments

@JoeZiminski
Copy link
Member

When writing to binary, a small chunk size is currently used. In general I think the less chunks the better, to avoid edge effects. A such it might be a good idea to a large possible chunk size that memory will allow (e.g. say use 70% of available memory)

@JoeZiminski JoeZiminski added the enhancement New feature or request label Sep 11, 2023
@JoeZiminski
Copy link
Member Author

#108

@JoeZiminski JoeZiminski added this to the 0.0.1 milestone Sep 28, 2023
@JoeZiminski JoeZiminski changed the title [Feature] Increase chunk size based on memory Increase chunk size based on memory Dec 4, 2023
@JoeZiminski
Copy link
Member Author

Scaling the memory was not as simple as possible. First, SI's memory tracking is tagged to undergo so improvement and is not trivial to implement, so getting a good estimate of memory used during pre-processing is not easy.

Nonetheless, a rough guess could be used, taking the maximum itemsize (for float 64 used in some preprocessing steps) and a 2-3 times multiplier based on dev feedback.

See #108 for a first implementation. The current blocker on this was finding the available memory across different settings.

  1. psutil.virtual_memory().available did not give accurate memory on SLURM nodes, giving a much higher value than requested (e.g. request 40GB it is showing 380 GB
  2. slurmio always returned 16GB even if 40 GB was requested.
    e.g.
(spikewrap) jziminski@gpu-380-14:/ceph/neuroinformatics/neuroinformatics/scratch/jziminski/ephys/code/spikewrap$ sacct  --format="MaxRSS, MaxRSSNode"
    MaxRSS MaxRSSNode
---------- ----------
 45941200K gpu-380-14
     7992K gpu-380-14
    17116K enc3-node4

but

>>> SlurmJobParameters().requested_memory
16
>>> SlurmJobParameters().allocated_memory
16000000

Once this is resolved it would be possible to expose a argument 'fixed_batch_size' that allows the user to fix a batch size. Otherwise, use 70% or so of available memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant