Plexon2RawIO's implementation of _get_analogsignal_chunks can be improved #1569

nikhilchandra · 2024-09-25T22:48:10Z

Is your feature request related to a problem? Please describe.
The latest version of the PL2FileReader API includes a function capable of retrieving subsets of data from an analog channel. Switching _get_analogsignal_chunks to this function should (hopefully) remove the need to load entire channels into self._analogsignal_cache (this takes up a LOT of memory) and might also speed things up.

Describe the solution you'd like
I want to do the following:

Add content to the currently empty pypl2lib function "pl2_get_analog_data_subset"
Replace the underlying DLL to the latest version
In plexon2rawio.py, add code to retrieve fragment (i.e., segment) information for each stream to the _parse_header function
Rewrite _get_analogsignal_chunks to use the subset function in conjunction with fragment information in lieu of the present cached solution
Run tests to check whether the new method is faster

Describe alternatives you've considered
There are none that I can think of.

Related Question
What is the proper way to handle requests that exceed the limits of a recording's segments? For example, suppose I have a multi-trial recording in which 2 recording segments run from [0, 10] and [50, 60] (units in seconds). Suppose the user requests data from [5, 15] seconds -- obviously we have data from [5, 10] from the first segment. But for the interval [10, 15] -- do we just return zeros?

zm711 · 2024-09-25T23:55:03Z

Sounds cool to me. I'm sure @h-mayorquin is also interested.

h-mayorquin · 2024-09-26T01:46:48Z

That sounds great. Concerning the question of what to do when asking for out of bound data. That is not clear to me, is this something that you have made a made a decision about @zm711 ?

samuelgarcia · 2024-09-26T14:12:02Z

Hi. On spikeinterface side we decided to raise an error.
On neo side this have never been clear.
I think the most often case is to clip not to zero pad.

zm711 · 2024-09-27T19:10:40Z

I'm looking right now and a bunch of the memmap ones just do slicing [start:stop, :] so that would just follow whatever the numpy memmap convention allows, no. Where is the clipping machinery?

nikhilchandra · 2024-09-30T23:56:33Z

@samuelgarcia Please, could you clarify what you mean by "clipping"?

samuelgarcia · 2024-10-01T07:02:06Z

Hi sorry just reading this.
I am a bit fuzzy.
cliping was for : if stop in [start:stop, :] si greater than the shape[0] then stop is is "clip" to shape[0].
In short very often neo behave like numpy for upper boudary.
but spikeinterface do not, it is more strick.

nikhilchandra · 2024-10-02T20:55:48Z

@h-mayorquin It looks like plexon2rawio.py downloads the relevant DLLs from this location:

https://raw.githubusercontent.com/Neuralensemble/pypl2/master/bin/{file_name}

where "file_name" can take on the value PL2FileReader.dll or PL2FileReader64.dll. How does one go about submitting updated DLLs to this location?

nikhilchandra · 2024-10-02T22:41:02Z

Another problem -- the functions _segment_t_start and _segment_t_stop take block_index and seg_index as arguments. However, for PL2 files the precise start/stop times for a given segment depend on the stream. For example, here are segment start times for 3 wideband and 3 field potential channels from the same recording.

WB001 segment timestamps: [ 67 3609805 7217229 10824175 14430721 18036311 21642619]
WB002 segment timestamps: [ 67 3609805 7217229 10824175 14430721 18036311 21642619]
WB003 segment timestamps: [ 67 3609805 7217229 10824175 14430721 18036311 21642619]

FP001 segment timestamps: [ 67 3609827 7217267 10824187 14430747 18036347 21642627]
FP002 segment timestamps: [ 67 3609827 7217267 10824187 14430747 18036347 21642627]
FP003 segment timestamps: [ 67 3609827 7217267 10824187 14430747 18036347 21642627]

Note that within streams the values are identical, but across streams they differ slightly. I am not sure what the correct approach is to incorporate these observations into the existing framework.

Note: all values are in units of ticks, not seconds.

zm711 · 2024-10-03T10:47:13Z

This is because the data is organized in packets?

How could the time stamps of the Wideband and field potential be different if it is the same data just filtered differently? Unless the headstage or equipment applies a slight delay to do the onboard filtering and then sends that stream with a slight delay?

nikhilchandra · 2024-10-04T16:10:01Z

WB and FP sampling rates are 40 kHz and 1 kHz, respectively. Each recorded sample is assigned an integer value (tick) that corresponds to a clock running at 40 kHz. Therefore, successive samples recorded for WB have tick values that increment by 1, while successive samples recorded for FP have tick values that increment by 40. Each segment timestamp in my previous message equals the tick value of the first sample recorded for that segment. Here are some predictions based on the preceding statements:

(1) For a given segment, the first recorded samples for WB vs FP channels should have tick values that differ by a maximum of 40
(2) For FP, tick values for all recorded samples must differ by a multiple of 40.

The attached calculations from an Excel spreadsheet show that both predictions work out.

In the row "WB-FP", we see that the difference in tick values for FP and WB segments never exceeds 40.

In the row "FP(n) - FP(0) % 40", we see that the difference in tick values for each FP segment compared with the first (67) is always a multiple of 40 .

I suspect that the correct course of action depends on exactly how _segment_t_start and _segment_t_end are used, will keep digging and make recommendations based on what I find.

nikhilchandra · 2024-10-08T18:49:28Z

Could someone please provide insight as to how to incorporate new 32- and 64-bit DLLs into the project (see my previous message from last week)? Thank you! @zm711 @h-mayorquin

zm711 · 2024-10-08T19:38:49Z

Either @h-mayorquin or @JuliaSprenger set this up originally. So to be honest I don't know. Hopefully Heberto can comment on how he set it up. I only helped with the shell commands for it :)

h-mayorquin · 2024-10-09T21:07:39Z

HI, the DLLs home is here:

https://github.com/NeuralEnsemble/pypl2

I guess the route would be to make a PR to this but I don't know the ownership situation of that repo

@samuelgarcia @apdavison?

Maybe I or @nikhilchandra could take ownership over it now that Julia is sadly not as present as she used to be.

zm711 · 2024-10-09T21:12:41Z

Just checked and I definitely don't have any writing permission there. We likely need to get Julia to share with a couple people so we don't lose access to it !

apdavison · 2024-10-09T21:34:08Z

I've given the Neo maintainers plus @h-mayorquin access to the pypl2 repo

h-mayorquin · 2024-10-09T21:41:43Z

@nikhilchandra feel free to make a PR there.

zm711 · 2024-10-11T13:38:42Z

@samuelgarcia and @apdavison could you comment on the timing question as well.

nikhilchandra · 2024-10-14T16:41:11Z

@h-mayorquin workin' on it :)

zm711 · 2024-10-14T20:27:10Z

@samuelgarcia comment if I'm misquoting you please, but I asked about your other question Nikhil and the idea was the t_start is just the minimum of the segment since there can be staggered start times for the exact reasons you mention but the actual segment starts whenever the first stream of data starts. Again Sam explained this to me in the meeting, but now I'm writing this after a few days, so it might not be completely clear.

nikhilchandra · 2024-10-14T21:11:51Z

@zm711 @samuelgarcia After digging through the code some more my understanding is that _segment_t_start and _segment_t_end are stream-independent, while _get_signal_t_start is for streams.

The pypl2lib library includes a stream-independent function called pl2_get_start_stop_channel_data() -- this function is independent of stream, so I have predicated _segment_start and _segment_t_end on that. Meanwhile, _get_signal_t_start does depend on stream, and the relevant information can be gathered from the "fragment" variables returned by pypl2lib's get_analog_channel_data().

zm711 · 2024-10-15T12:31:11Z

Yes the idea is that a segment can contain multiple streams of data so the overall start of the segment is stream independent because it is whenever the first stream starts within that segment. The signals are the streams and so those rely on the actual stream being assessed.

h-mayorquin · 2024-10-15T20:03:03Z

Any place where we could add the documentation of the meaning of these methods? I remember being confused about it as well at some point.

zm711 · 2024-10-15T20:04:46Z

You want to tack your comment to #1508. That is the reminder to make our rawio docs better :)

nikhilchandra · 2024-10-28T22:42:42Z

There is some nuance to this problem that I did not fully appreciate earlier. I need to decide what to do -- some input would be appreciated @zm711 @h-mayorquin @samuelgarcia @alejoe91

The OmniPlex system supports trial-based recordings, which means you can pause/resume recordings as needed over the course of an experiment. While SpikeInterface deals with "segments," the PL2 format uses "fragments." Unfortunately, the concept of PL2 fragments does not map perfectly to SpikeInterface segments, as I will show below.

When you start, stop, pause, or resume a recording, this creates a timestamped event. Information about such events can be retrieved with 2 functions in the PL2FileReader API -- PL2_GetStartStopChannelInfo and PL2_GetStartStopChannelData. Note -- these events are not tied to any stream.

When retrieving analog channel data (what SpikeInterface would refer to as a "stream"), one has the option of using the API's function PL2_GetAnalogChannelData. This returns 5 variables:

num_fragments_returned - the total number of fragments returned
num_data_points_returned - the total number of data points returned (across all fragments)
fragment_timestamps - array containing first integer timestamps for each fragment
fragment_counts - array containing number of data points for each fragment
values - array containing data for all fragments, stored contiguously

For a recording with multiple segments, the function returns all data points for all fragments in a single, contiguous array. We need to use fragment_timestamps and fragment_counts to make sense of things.

HOWEVER

Suppose we conduct a recording in one continuous block, i.e., we do not pause/resume the recording at any time. This means that # of segments == 1. Such a recording can still have multiple fragments if the OmniPlex hardware gets overloaded so there are gaps in the data that actually gets recorded. In effect, there is a mismatch between # of segments and # of fragments. The gap need not be very big. I am currently playing around with a file in which the following values are returned for a supposedly continuous recording:

num_fragments_returned = 2
fragment_timestamps = [61, 8200645]
fragment_counts[8196286, 41286384]

The first fragment occupies the tick-interval [61, 8196347)
The second fragment occupies the tick-interval [8200645, 49487029)

So the gap between the first and second fragments is 4,298 samples. Given the 40 kHz sampling rate, this translates to roughly 107 ms of missing data.

PROPOSED SOLUTION

I think we should use the information from PL2_GetStartStopChannelInfo and PL2_GetStartStopChannelData as the ground truth -- if this information indicates that there is more than 1 segment, then reject the file for processing.

At the same time, if the number of fragments returned exceeds 1 due to one or more gaps in the data, we should do the following:

Issue a warning
Apply a threshold to determine whether the gap is acceptible.
a. If the gap exceeds the threshold, break
b. If the gap is less than the threshold, just fill it in with zeros.

Questions:

If we do go with my proposed solution, then what would be an acceptable gap threshold? 1 second? 500 ms?
Would we want to add this threshold as a parameter that can be set by the user somewhere?

zm711 · 2024-10-29T12:31:39Z

I will read your comment soon. Today I'm swamped but I will try for the next couple days :)

h-mayorquin · 2024-10-29T18:59:12Z

Same here, this is on my radar but this week is super busy.

samuelgarcia · 2024-10-30T14:07:39Z

@nikhilchandra
I totally agree with, the segments concept in reader should be revised it fuuzy used for handling gaps (like in neurlynx) and handling wanted pause during recording.

I would propose a call to discuss that with @zm711 @h-mayorquin and maybe @alejoe91.

I think we should have a clear gap handling API directly in neo.rawio and the user could decide the tolerance of the gaps and optionaly could retrieve the timestamps of samples.
This is a massive task.

nikhilchandra · 2024-10-30T20:40:27Z

@samuelgarcia Agreed

@zm711 @h-mayorquin @alejoe91
Let me know if you'd like to do a call sometime

nikhilchandra · 2024-11-06T22:49:03Z

@samuelgarcia I am not sure if I should just proceed with my changes or if some discussion is necessary first, based on preceding comments. Please advise.

h-mayorquin · 2024-11-11T18:09:26Z

Sorry for the delay on this. We all have been very busy. We will discuss this this week, would you like to join? I will send an invite by email.

nikhilchandra · 2024-11-11T19:01:07Z

I have responded to your email. See you soon!

h-mayorquin · 2024-11-13T18:01:21Z

OK, we discussed this today. Sam proposed a general way of thinking about this but I am leaving here the relevant context to move this forward:

Semantic distinction between segments and gaps:
Segments are international gaps are not.

In the ideal case we have an automatic way of telling this. Intentional jumps in time become segments non-intentional ones becomes gaps. Gaps should not be segment and we should create an API to collect them.

Nikhil action items:

Figure out if the if Plexon2 API allows to distinguish between intentional and non-intentional jumps in time (segments vs gaps). This is probably related to the fragments vs get_events APi.
If this exists, we should modify plexon2 neo raw IO so we don’t over-segment.

I will open another issue on python-neo to discuss the API thing.

nikhilchandra added the enhancement label Sep 25, 2024

zm711 added this to the future milestone Oct 11, 2024

h-mayorquin mentioned this issue Oct 15, 2024

Improve RawIO Documentation #1508

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plexon2RawIO's implementation of _get_analogsignal_chunks can be improved #1569

Plexon2RawIO's implementation of _get_analogsignal_chunks can be improved #1569

nikhilchandra commented Sep 25, 2024 •

edited

Loading

zm711 commented Sep 25, 2024

h-mayorquin commented Sep 26, 2024

samuelgarcia commented Sep 26, 2024

zm711 commented Sep 27, 2024

nikhilchandra commented Sep 30, 2024

samuelgarcia commented Oct 1, 2024

nikhilchandra commented Oct 2, 2024

nikhilchandra commented Oct 2, 2024 •

edited

Loading

zm711 commented Oct 3, 2024 •

edited

Loading

nikhilchandra commented Oct 4, 2024 •

edited

Loading

nikhilchandra commented Oct 8, 2024 •

edited

Loading

zm711 commented Oct 8, 2024

h-mayorquin commented Oct 9, 2024

zm711 commented Oct 9, 2024

apdavison commented Oct 9, 2024

h-mayorquin commented Oct 9, 2024

zm711 commented Oct 11, 2024

nikhilchandra commented Oct 14, 2024

zm711 commented Oct 14, 2024

nikhilchandra commented Oct 14, 2024

zm711 commented Oct 15, 2024

h-mayorquin commented Oct 15, 2024

zm711 commented Oct 15, 2024

nikhilchandra commented Oct 28, 2024

zm711 commented Oct 29, 2024

h-mayorquin commented Oct 29, 2024

samuelgarcia commented Oct 30, 2024

nikhilchandra commented Oct 30, 2024

nikhilchandra commented Nov 6, 2024 •

edited

Loading

h-mayorquin commented Nov 11, 2024

nikhilchandra commented Nov 11, 2024

h-mayorquin commented Nov 13, 2024

Plexon2RawIO's implementation of _get_analogsignal_chunks can be improved #1569

Plexon2RawIO's implementation of _get_analogsignal_chunks can be improved #1569

Comments

nikhilchandra commented Sep 25, 2024 • edited Loading

zm711 commented Sep 25, 2024

h-mayorquin commented Sep 26, 2024

samuelgarcia commented Sep 26, 2024

zm711 commented Sep 27, 2024

nikhilchandra commented Sep 30, 2024

samuelgarcia commented Oct 1, 2024

nikhilchandra commented Oct 2, 2024

nikhilchandra commented Oct 2, 2024 • edited Loading

zm711 commented Oct 3, 2024 • edited Loading

nikhilchandra commented Oct 4, 2024 • edited Loading

nikhilchandra commented Oct 8, 2024 • edited Loading

zm711 commented Oct 8, 2024

h-mayorquin commented Oct 9, 2024

zm711 commented Oct 9, 2024

apdavison commented Oct 9, 2024

h-mayorquin commented Oct 9, 2024

zm711 commented Oct 11, 2024

nikhilchandra commented Oct 14, 2024

zm711 commented Oct 14, 2024

nikhilchandra commented Oct 14, 2024

zm711 commented Oct 15, 2024

h-mayorquin commented Oct 15, 2024

zm711 commented Oct 15, 2024

nikhilchandra commented Oct 28, 2024

zm711 commented Oct 29, 2024

h-mayorquin commented Oct 29, 2024

samuelgarcia commented Oct 30, 2024

nikhilchandra commented Oct 30, 2024

nikhilchandra commented Nov 6, 2024 • edited Loading

h-mayorquin commented Nov 11, 2024

nikhilchandra commented Nov 11, 2024

h-mayorquin commented Nov 13, 2024

nikhilchandra commented Sep 25, 2024 •

edited

Loading

nikhilchandra commented Oct 2, 2024 •

edited

Loading

zm711 commented Oct 3, 2024 •

edited

Loading

nikhilchandra commented Oct 4, 2024 •

edited

Loading

nikhilchandra commented Oct 8, 2024 •

edited

Loading

nikhilchandra commented Nov 6, 2024 •

edited

Loading