Skip to content

General API for handling sample gaps on rawio #1773

@h-mayorquin

Description

@h-mayorquin

@zm711, @samuelgarcia, and I had a discussion today about two current Blackrock issues (#1770 and #1755).

The main problem is that the gaps automatically detected by the code are much smaller than what would reasonably qualify as a segment in Neo (for example, someone deliberately pausing the recording). These gaps are more likely artifacts of the system (in #1770, it looks like a case of dropped samples).

Creating format-specific heuristics for each format would be a large maintenance burden. A simpler, more maintainable, and general solution is to provide users (who know their data best) with both information and control. We came up with the following proposal:

  1. Default behavior: If gaps are detected, loading should error out. RawIO readers should raise an error when timestamp gaps are found and display a report showing the number, size, and characteristics of the gaps. This way, users can make an informed decision. Examples of this approach can be seen in Blackrock add summary of automatic data segmentation  #1769 and Improve intan reader error message for discontinuities #1484.
  2. Opt-in behavior: RawIO readers should provide an argument (e.g. segmentation_threshold, though we can choose a better name) that lets users explicitly load data with gaps if they wish. Gaps smaller than the value will be ignored, gaps larger than the value will be segmented.
  3. Timestamps from acquisition system: To make data with gaps more useful, we should provide the original timestamps from the acquisition system when available (see Add utility method to get timestamps on Intan base #1652).

This design would allow us to implement a consistent API across RawIO, so gaps are handled with a common interface. The plan is as follows:

  1. Implement the solution first for Blackrock (since there are open issues and users currently cannot read the data). This will also let us to discuss the types and naming of the interface.
  2. Extend the interface to other popular readers (e.g. Intan, OpenEphys, SpikeGLX, Plexon) to uncover potential difficulties.
  3. Once the API is stable for the main readers, integrate it into the abstract/parent classes as a general RawIO API.
  4. If possible, deprecate the current gap-handling mechanisms already in place (e.g. in OpenEphys) to simplify the codebase.

Another important point raised by @samuelgarcia is that integrity checks can be computationally expensive. We discussed adding a general flag (e.g. ignore_integrity_checks) that would let users opt out of all checks, including timestamp checks. This would make data access faster in performance-sensitive environments, while the default behavior would remain safe with integrity checks enabled. To see an example of this interface on Intan check: #1470. Important, note that ignore_integrity_checks is not only for timestamps gaps but covers any safety mechanism to check the integrity of the files (see here for one of such checks: #1740).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions