-
Notifications
You must be signed in to change notification settings - Fork 261
Fix over-segmentation in BlackRock #1789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix over-segmentation in BlackRock #1789
Conversation
data_offset = current_offset + header.dtype.itemsize | ||
timestamp = header["timestamp"] | ||
|
||
# Create data view into memmap for this block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we use views on a linear memmaped buffer to avoid creating a memmap per data block. This is good because OSes usually limit the ammount of memamps you can create.
# Remove if raw loading becomes possible | ||
# raise IOError("For loading Blackrock file version 2.1 .nev files are required!") | ||
|
||
# This requires nsX to be parsed already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed here to do this after segmenting.
self.nsx_datas[nsx_nb] = _data_reader_fun(nsx_nb) | ||
data_spec = spec_version | ||
|
||
# Parse data blocks (creates memmap, extracts data+timestamps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core of the PR:
- We parse the data blocks (I improved the docstrings there)
- We segment the file and report if necessary
- We transform back to the previous data structures to keep the diff small.
filesize = self._get_file_size(filename) | ||
num_samples = int((filesize - bytes_in_headers) / (2 * channels) - 1) | ||
offset = bytes_in_headers | ||
# Create data view into memmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the same technique to avoid many memmaps: create an array with the buffer (one memmap) and then create views into it.
This will fix #1770
Testing data is merged here https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/167 and is used on testing here.
This PR fixes #1770 using the methodology discussed on #1773. To do this, we do three things:
gap_tolerance_ms
to give user control over the size of the gaps that should create segments. By default, it is None and an error is raised if gaps are found. User can then opt-in to load the data by specifying a gap size.This is a large PR but I think that @samuelgarcia prefers them that way rather than sausage sliced PRs and he is the person that might end up reviewing it. I am happy to break it apart if whoever is gona review this prefers it another way.
Tagging @cboulay here as requested in case if he has time to check it.