-
Notifications
You must be signed in to change notification settings - Fork 261
Description
Dear Neo developers,
We came across a use-case which revealed a major inefficiency in Neo (or rather the quantities
package Neo uses, as I will explain).
In our simulations were are recording from hundreds of thousands neurons, and in some cases
in response to thousands of stimuli, creating very large number of SpikeTrain in the process.
We found that loading this data, despite it being in the order of still manageble 10s of Gigabytes
can take 10s of hours. We tracked the reason for this extremely slow performance to the fact that
there is a very significant constant overhead associated with creation of each SpikeTrain object, that unfortunately doesn't happen in Neo itself but can be tracked down to the quantities package.
Initially I hoped this could be rectified by using the SpikeTrainList object, but ultimately that object uses
individual SpikeTrains anyways.
My question is (a) do you have any ideas how we could improve the efficiency in our use case, (b) is there any plan to rework SpikeTrainList to use more efficient representation, given that it works with list of spiketrain with common temporal reference?
Bellow I am attaching a quick example code demonstrating that unpickling a SpikeTrainList can be as much as 100 faster if one saves it with only the multiplexed representation as opposed to the equivalent SpikeTrainList saved with the SpikeTrain representation. Note that the multiplxed representation does not solve anything for us, because the moment we want to use such loaded SpikeTrainList to do any operations it will trigger the conversion into the list of SpikeTrains representation nullifying the time saved time during loading.
example code
import cProfile
import pstats
import neo.core.spiketrain
import importlib
from neo.core.spiketrain import SpikeTrain
from neo.core.spiketrainlist import SpikeTrainList, is_spiketrain_or_proxy
from pstats import SortKey
from quantities import ms
import pickle
import numpy
s = [SpikeTrain(list(range(0,200))*ms,t_stop=10000) for i in range(100000)]
sl = SpikeTrainList(s)
with open("dump_standard.pickle", 'wb') as f:
pickle.dump(sl,f)
f.close()
a,b = sl.multiplexed
with open("dump_modified.pickle", 'wb') as f:
pickle.dump(SpikeTrainList.from_spike_time_array(b,a,
all_channel_ids=list(range(0,len(sl))),
units='ms',
t_start=0 * ms,
t_stop=10000.0 * ms)
,f)
f.close()
def aaa():
with open("dump_standard.pickle", 'rb') as f:
sps = pickle.load(f)
print(numpy.mean(sps))
f.close()
def bbb():
with open("dump_modified.pickle", 'rb') as f:
sps = pickle.load(f)
print(numpy.mean(sps))
f.close()
cProfile.run('bbb()', 'restats_after_modified')
p = pstats.Stats('restats_after_modified')
p.sort_stats(SortKey.CUMULATIVE).print_stats(15)
cProfile.run('aaa()', 'restats_after_standard')
p = pstats.Stats('restats_after_standard')
p.sort_stats(SortKey.CUMULATIVE).print_stats(15)