Skip to content

Inefficiency of creating large numbers of SpikeTrain objects #1730

@antolikjan

Description

@antolikjan

Dear Neo developers,

We came across a use-case which revealed a major inefficiency in Neo (or rather the quantities
package Neo uses, as I will explain).

In our simulations were are recording from hundreds of thousands neurons, and in some cases
in response to thousands of stimuli, creating very large number of SpikeTrain in the process.
We found that loading this data, despite it being in the order of still manageble 10s of Gigabytes
can take 10s of hours. We tracked the reason for this extremely slow performance to the fact that
there is a very significant constant overhead associated with creation of each SpikeTrain object, that unfortunately doesn't happen in Neo itself but can be tracked down to the quantities package.

Initially I hoped this could be rectified by using the SpikeTrainList object, but ultimately that object uses
individual SpikeTrains anyways.

My question is (a) do you have any ideas how we could improve the efficiency in our use case, (b) is there any plan to rework SpikeTrainList to use more efficient representation, given that it works with list of spiketrain with common temporal reference?

Bellow I am attaching a quick example code demonstrating that unpickling a SpikeTrainList can be as much as 100 faster if one saves it with only the multiplexed representation as opposed to the equivalent SpikeTrainList saved with the SpikeTrain representation. Note that the multiplxed representation does not solve anything for us, because the moment we want to use such loaded SpikeTrainList to do any operations it will trigger the conversion into the list of SpikeTrains representation nullifying the time saved time during loading.

example code

import cProfile
import pstats
import neo.core.spiketrain
import importlib
from neo.core.spiketrain import SpikeTrain
from neo.core.spiketrainlist import SpikeTrainList, is_spiketrain_or_proxy
from pstats import SortKey
from quantities import ms
import pickle
import numpy

s = [SpikeTrain(list(range(0,200))*ms,t_stop=10000) for i in range(100000)]
sl = SpikeTrainList(s)

with open("dump_standard.pickle", 'wb') as f:
pickle.dump(sl,f)
f.close()

a,b = sl.multiplexed

with open("dump_modified.pickle", 'wb') as f:
pickle.dump(SpikeTrainList.from_spike_time_array(b,a,
all_channel_ids=list(range(0,len(sl))),
units='ms',
t_start=0 * ms,
t_stop=10000.0 * ms)
,f)
f.close()

def aaa():
with open("dump_standard.pickle", 'rb') as f:
sps = pickle.load(f)
print(numpy.mean(sps))
f.close()

def bbb():
with open("dump_modified.pickle", 'rb') as f:
sps = pickle.load(f)
print(numpy.mean(sps))
f.close()

cProfile.run('bbb()', 'restats_after_modified')
p = pstats.Stats('restats_after_modified')
p.sort_stats(SortKey.CUMULATIVE).print_stats(15)

cProfile.run('aaa()', 'restats_after_standard')
p = pstats.Stats('restats_after_standard')
p.sort_stats(SortKey.CUMULATIVE).print_stats(15)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions