Skip to content

Conversation

h-mayorquin
Copy link
Contributor

@h-mayorquin h-mayorquin commented Oct 8, 2025

This will fix #1770

Testing data is merged here https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/167 and is used on testing here.

This PR fixes #1770 using the methodology discussed on #1773. To do this, we do three things:

  1. We separate the parsing of the data blocks from the segmentation of the data
  2. I also continue the refactor started in Blackrock: improve nev data reading #1772 and Blackrock: improve nev header reading #1771 to remove dynamically loaded functions per spec. Now, the data types of the headers and the data blocks are global variables and are separated from the code.
  3. The gaps report (see Blackrock add summary of automatic data segmentation  #1769) was only working for the PTP format. All the formats with timestamps report gaps now.
  4. Following the discussion on General API for handling sample gaps on rawio #1773 this PR introduces the new gap_tolerance_ms to give user control over the size of the gaps that should create segments. By default, it is None and an error is raised if gaps are found. User can then opt-in to load the data by specifying a gap size.
  5. I used a buffered version of the memmaps so we don't need to create a memamp per block. This reduces the number of memmaps created by the reader which is an OS limit.

This is a large PR but I think that @samuelgarcia prefers them that way rather than sausage sliced PRs and he is the person that might end up reviewing it. I am happy to break it apart if whoever is gona review this prefers it another way.

Tagging @cboulay here as requested in case if he has time to check it.

@h-mayorquin h-mayorquin self-assigned this Oct 8, 2025
@h-mayorquin h-mayorquin marked this pull request as ready for review October 8, 2025 13:30
data_offset = current_offset + header.dtype.itemsize
timestamp = header["timestamp"]

# Create data view into memmap for this block
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we use views on a linear memmaped buffer to avoid creating a memmap per data block. This is good because OSes usually limit the ammount of memamps you can create.

# Remove if raw loading becomes possible
# raise IOError("For loading Blackrock file version 2.1 .nev files are required!")

# This requires nsX to be parsed already
Copy link
Contributor Author

@h-mayorquin h-mayorquin Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here to do this after segmenting.

self.nsx_datas[nsx_nb] = _data_reader_fun(nsx_nb)
data_spec = spec_version

# Parse data blocks (creates memmap, extracts data+timestamps)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the PR:

  1. We parse the data blocks (I improved the docstrings there)
  2. We segment the file and report if necessary
  3. We transform back to the previous data structures to keep the diff small.

filesize = self._get_file_size(filename)
num_samples = int((filesize - bytes_in_headers) / (2 * channels) - 1)
offset = bytes_in_headers
# Create data view into memmap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the same technique to avoid many memmaps: create an array with the buffer (one memmap) and then create views into it.

@zm711 zm711 added this to the 0.15.0 milestone Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BlackrockRawIO is over-segmenting data

2 participants