Description
phy extract-waveforms saves waveforms with the wrong dtype if the raw data file is encoded as float32.
Steps to reproduce:
- Download and unzip the example dataset: https://drive.google.com/file/d/1mshkvPaxKpHjWK4z67HtfXlUvyuprX9B/view?usp=sharing
- Navigate to the folder you saved it to in your command line
- Run
phy extract-waveforms params.py
- Start python and run the following commands:
import numpy as np
np.load('_phy_spikes_subset.waveforms.npy')
Expected behaviour:
The waveforms are loaded
Actual behaviour:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\jmb9770\Anaconda3\envs\phy\Lib\site-packages\numpy\lib\npyio.py", line 456, in load
return format.read_array(fid, allow_pickle=allow_pickle,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmb9770\Anaconda3\envs\phy\Lib\site-packages\numpy\lib\format.py", line 839, in read_array
array.shape = shape
^^^^^^^^^^^
ValueError: cannot reshape array of size 31004592 into shape (63534,61,16)
Environment info:
OS: Windows 10 x64
Python verison: 3.11.9
Conda verison: 23.3.1
phy version: 2.0b5
phylib version: 2.4.3
Additional info:
The culprit appears to be on line 657 of phylib/io/traces.py
, where the dtype of the waveforms is inferred if sample2unit
is None, else set to float. The phy command extract-waveforms
never sets sample2unit
, so it always defaults to 1.0, and hence the written waveforms have dtype float
, which on most modern python installations means float64. If raw data file from which the waveforms are loaded is of integer type, the multiplication by 1.0 will coerce them to float64, hence they will be written correctly. If the raw data file is of type float32, however, no such coercion will take place and the NpyWriter
will byte-copy the float32-encoded waveforms to the waveforms .npy file that claims to have dtype float64
in its header.