Fix spikeglx events memmap #1790

h-mayorquin · 2025-10-09T01:32:32Z

This PR fixes the memory inefficiency issue reported in #1610 where parsing NIDQ files with digital channels would allocate excessive memory during parse_header(). For a 2.8 GiB NIDQ file, initialization would allocate ~3.9 GiB of RSS, making it impossible to work with large recordings without running out of memory.

The fix implements lazy loading of event data by deferring the np.unpackbits() operation from _parse_header() to _get_event_timestamps(). During initialization, we now only create a memmap view of the digital word channel (virtual memory, no RAM allocation). The actual unpacking into individual event bits only happens when events are accessed, significantly reducing memory footprint during initialization. Additionally, sample_length is now calculated from the actual file size instead of metadata to handle stub/modified files correctly ( we did something similar before I don't know if you remember)

I ran some benchmarks using memray for some work files (a 844 MB NIDQ file): Peak memory reduced from 2.9 GB to 906 MB (68.7% reduction), with real RSS at initialization dropping from ~3.9 GB to ~20 MB (99.5% reduction).

samuelgarcia · 2025-10-09T06:19:47Z

neo/rawio/spikeglxrawio.py

        if len(dig_ch) > 0:
-            event_data = self._events_memmap
+            # Unpack bits on-demand - this is when memory allocation happens
+            event_data = np.unpackbits(self._events_memmap_digital_word.astype(np.uint8)[:, None], axis=1)


Should we cache this ?

I think I agree with this. we were previously caching this, so if we are doing the unpacking we should cache right?

We don't cache any other call to data. Why would this be different?

We were not caching before because we intended it to, we were just loading the data at init because someone outside of the team did the PR I think.

I prefer my resources to be freed outside of the call to get_data, how are you guys thinking about it?

I think that's a fair point. How expensive is unpackbits. I don't know that well. It wouldn't be too hard to say:

if self.event_data is Not None:
event_data = self.event_data
else:
self.event_data = event_data = unpack

I guess the ultimate question is if this is a cheap operation then it doesn't hurt to do it in the init, it's just contrary to our strategy. If it is expensive (time or memory) we should move it. If expensive in memory then caching is expensive. If expensive in time and possibly something a user would want multiple times it would be nice to have a cache.

I'm curious to hear Sam on this one. I could be convinced not to cache it.

Why the giant allocation? Is the memory actually being used to store the events. I just don't know what events look like for this reader.

Question was more:

is the demultiplexing that occurs with the unpackbits (+ the memory allocation) relatively slow or relatively fast and how does it scale with length of recording

Does it always demultiplex to the same n_channels such that one could estimate the memory allocation based on the length of recording or is this variable based on what the user does

I think moving this from parse_header (sorry said init earlier), is smart because we just need the memmap. My further question though for caching is speed and memory allocation after. I don't use events from the neo api at all so for me it's hard to imagine use cases. But maybe someone would try to get the events multiple times for something in which case it might be nice to have a cache if the process is relatively slow. If fast then absolutely no caching.

The general question is whether we cache data request in general. There is a trade-off between memory allocation (the memory allocation will remain after you do the data request which is bad) vs making it faster the next time you request. So this is the lens to discuss I think:

To my surprise, unpackbits turned out to be more expensive than I expected. However, it’s not unusually costly compared to other operations we perform, such as demultiplexing signals (see Intan) or scaling them (see spike interface return scaled). In those cases, we don’t use caching, so I think we should be consistent: either apply caching across all of them or not use it at all.

Here are some performance comparisons:

echo "=== np.unpackbits ===" && python3 -m timeit -n 3 -r 3 "import numpy as np; d = np.random.randint(-32768, 32767, 216_000_000, dtype='int16'); np.unpackbits(d.astype(np.uint8)[:, None], axis=1)" && \ echo "=== bitwise_and ===" && python3 -m timeit -n 3 -r 3 "import numpy as np; d = np.random.randint(-32768, 32767, 216_000_000, dtype='int16'); np.bitwise_and(d, 1) > 0" && \ echo "=== stim decode ===" && python3 -m timeit -n 3 -r 3 "import numpy as np; d = np.random.randint(-32768, 32767, 216_000_000, dtype='int16'); mag = np.bitwise_and(d, 0xFF); sign = np.bitwise_and(np.right_shift(d, 8), 0x01); np.where(sign == 1, -mag.astype(np.int16), mag.astype(np.int16))" && \ echo "=== arithmetic ===" && python3 -m timeit -n 3 -r 3 "import numpy as np; d = np.random.randint(-32768, 32767, 216_000_000, dtype='int16'); d * 0.195" === np.unpackbits === 3 loops, best of 3: 2.25 sec per loop === bitwise_and === 3 loops, best of 3: 461 msec per loop === stim decode === 3 loops, best of 3: 2.89 sec per loop === arithmetic === 3 loops, best of 3: 1.27 sec per loop

think we should merge this as it is because this is consistent with the way that the library works: there are are similarly expensive data request uncached in other places. It is Ok to change this, but we should adopt a general policy and change it consistently in all the places.

Actually, I think that rawio should remain simple and without these type of complications. People can still cache the data on the spot if they want.

I'm fine with this argument in general. Let's wait until tomorrow in case Sam wants to strongly fight for the other direction and if not we merge tomorrow afternoon?

samuelgarcia · 2025-10-09T06:20:57Z

OK for me.
Maybe we should teh fact that the unpacked should be cached or not as an attributes.
maybe not so we can merge.

zm711

Let's discuss the cache before merge

h-mayorquin added 3 commits October 8, 2025 18:57

delay memory access till event access on spikeglx reader

38340b3

fix tests

2bcc45c

black

6a1897e

h-mayorquin mentioned this pull request Oct 9, 2025

Release 0.14.3 #1787

Merged

samuelgarcia reviewed Oct 9, 2025

View reviewed changes

samuelgarcia approved these changes Oct 9, 2025

View reviewed changes

zm711 added this to the 0.14.3 milestone Oct 9, 2025

zm711 requested changes Oct 9, 2025

View reviewed changes

zm711 approved these changes Oct 10, 2025

View reviewed changes

zm711 merged commit 550dc8d into NeuralEnsemble:master Oct 10, 2025
3 checks passed

h-mayorquin deleted the fix_spikeglx_events_memmap branch October 11, 2025 16:44

Fix spikeglx events memmap #1790

Fix spikeglx events memmap #1790

Uh oh!

Conversation

h-mayorquin commented Oct 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h-mayorquin Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samuelgarcia commented Oct 9, 2025

Uh oh!

zm711 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

h-mayorquin Oct 9, 2025 •

edited

Loading