Inefficiency of creating large numbers of SpikeTrain objects

Dear Neo developers, 

We came across a use-case which revealed a major inefficiency in Neo (or rather the quantities 
package Neo uses, as I will explain). 

In our simulations were are recording from hundreds of thousands neurons, and in some cases 
in response to thousands of stimuli, creating very large number of SpikeTrain in the process. 
We found that loading this data, despite it being in the order of still manageble 10s of Gigabytes 
can take 10s of hours. We tracked the reason for this extremely slow performance to the fact that 
there is a very significant constant overhead associated with creation of each SpikeTrain object, that unfortunately  doesn't happen in Neo itself but can be tracked down to the quantities package. 

Initially I hoped this could be rectified by using the SpikeTrainList object, but ultimately that object uses 
individual SpikeTrains anyways. 

My question is (a) do you have any ideas how we could improve the efficiency in our use case, (b) is there any plan to rework SpikeTrainList to use more efficient representation, given that it works with list of spiketrain with common temporal reference? 

Bellow I am attaching a quick example code demonstrating that unpickling a SpikeTrainList can be as much as 100 faster if one saves it with only the multiplexed representation as opposed to the equivalent SpikeTrainList  saved with the SpikeTrain representation. Note that the multiplxed representation does not solve anything for us, because the moment we want to use such loaded SpikeTrainList to do any operations it will trigger the conversion into the list of SpikeTrains representation nullifying the time saved time during loading.


*example code*

import cProfile
import pstats
import neo.core.spiketrain
import importlib
from neo.core.spiketrain import SpikeTrain
from neo.core.spiketrainlist import SpikeTrainList, is_spiketrain_or_proxy
from pstats import SortKey
from quantities import ms
import pickle
import numpy 


s = [SpikeTrain(list(range(0,200))*ms,t_stop=10000) for i in range(100000)]
sl = SpikeTrainList(s)

with open("dump_standard.pickle", 'wb') as f:
     pickle.dump(sl,f)
f.close()

a,b = sl.multiplexed

with open("dump_modified.pickle", 'wb') as f:
     pickle.dump(SpikeTrainList.from_spike_time_array(b,a,
                                                      all_channel_ids=list(range(0,len(sl))),
                                                      units='ms',
                                                      t_start=0 * ms,
                                                      t_stop=10000.0 * ms)
                                                ,f)
f.close()

def aaa():
    with open("dump_standard.pickle", 'rb') as f:
         sps = pickle.load(f)
         print(numpy.mean(sps))
    f.close()

def bbb():
    with open("dump_modified.pickle", 'rb') as f:
         sps = pickle.load(f)
         print(numpy.mean(sps))
    f.close()

cProfile.run('bbb()', 'restats_after_modified')
p = pstats.Stats('restats_after_modified')
p.sort_stats(SortKey.CUMULATIVE).print_stats(15)

cProfile.run('aaa()', 'restats_after_standard')
p = pstats.Stats('restats_after_standard')
p.sort_stats(SortKey.CUMULATIVE).print_stats(15)

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficiency of creating large numbers of SpikeTrain objects #1730

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficiency of creating large numbers of SpikeTrain objects #1730

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions