-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read after write dev #77
Conversation
…ess (documentation missing), but as is passes for dual probe converted with
@@ -56,18 +56,20 @@ | |||
# sessions_to_run = list(set(brain_wide_sessions) - set(already_written_processed_sessions)) | |||
|
|||
nwbfile_path = base_path / "nwbfiles" / session / f"{session}.nwb" | |||
nwbfile_path.parent.mkdir(exist_ok=True) | |||
os.makedirs(nwbfile_path.parent, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.makedirs(nwbfile_path.parent, exist_ok=True) | |
nwbfile_path.parent.mkdir(exist_ok=True) |
Why change this? Pathlib is generally more convenient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add parents=True
if that is necessary
# Download behavior and spike sorted data for this session | ||
session_path = base_path / "ibl_conversion" / session | ||
cache_folder = base_path / "ibl_conversion" / session / "cache" | ||
os.makedirs(session_path, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should exist at this point from the previous calls
from typing import Optional | ||
from numpy import testing | ||
|
||
def check_arrays(array_a: np.ndarray, array_b: np.ndarray, full_check: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This style of redirection is not recommended (a helper function just to decide between uses of numpy.testing
)
TBH any data array in the processed files will be small enough such that a full (non-subset) usage of numpy.testing.assert_allclose
should always be fast enough to just do that
The only question for submitting would be if you want to do something like this on the raw data, but that's probably important enough for it to be a separate check altogether
Also, the NWB write step should not affect floating point precision (and if it did we would actually want to be informed about that) so I recommend using numpy.testing.assert_array_equal
instead. Or in general, recommend numpy.testing.assert_array_almost_equal
since it should check other aspects of the arrays beyond the data values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only question for submitting would be if you want to do something like this on the raw data, but that's probably important enough for it to be a separate check altogether
I had this in mind when I was writing the function, and that small speed improvements are probably going to result in large total time savings. That said, I haven't done any benchmarking yet (besides the np.searchsorted
way to access the spike times).
Also, the NWB write step should not affect floating point precision (and if it did we would actually want to be informed about that) so I recommend using numpy.testing.assert_array_equal instead. Or in general, recommend numpy.testing.assert_array_almost_equal since it should check other aspects of the arrays beyond the data values
I wasn't entirely sure if this is true across all architectures or if we need to consider some other form of rounding error imprecision, so I thought using allclose
would allow to adjust for that if necessary, and if true equality is desired rtol=0
should do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to think of the NWB conversion process (for bulk data anyway) is that 'bytes in' are still just 'bytes out', so if you decoded the original SpikeGLX binaries into their raw bytes (things that look like \xff\x01r
), those sequences should be exactly the same as when you read them from the data array after it's written to the NWB file
Floating point errors would mostly come up if you performed operations like scaling (such as by the conversion factor to get values into scientific units)
Or if you used lossless compression to save more space in the file; though that would often exceed standard 'floating point' error
|
||
|
||
|
||
def check_series(series_a: pd.Series, series_b: pd.Series, full_check: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not really recommended to define a one-line redirection function. Just making more work than necessary for ourselves here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This had somewhat historical reasons and I left it in as a stub. When I started coding, I wasn't even aware of the existence of numpy.testing
so the initial flow of the data was in chained calls where I could go starting from a complex datatype like a pd.DataFrame
consecutively to simple ones, where in the end an assert()
call would take care of the comparison of the selected numbers.
Things have changed, but I left the general structure up for discussion in case it could still be useful for a case I haven't thought about.
check_arrays(series_a.values, series_b.values, full_check=full_check) | ||
|
||
|
||
def check_tables(table_a: pd.DataFrame, table_b: pd.DataFrame, naming_map: dict = None, full_check: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for checking that two pandas data frames are equivalent, why not usepandas.testing.assert_frame_equal
? (and resolve any column name mapping by creating a copy of the NWB or ONE frame and renaming the columns on the copy passed to the assertion?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) thanks for pointing me towards pandas.testing
table_a (pd.DataFrame): _description_ | ||
table_b (pd.DataFrame): _description_ | ||
naming_map (dict, optional): if naming map is given, it is used to map the names of columns of table_a to those of table_b. Defaults to None, checks if columns are identical. | ||
full_check (bool, optional): _description_. Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An index subset approach should especially not be needed for tables since any data here is already loaded into RAM (making any non-subset assertion super fast)
from pynwb import NWBHDF5IO, NWBFile | ||
from one.api import ONE | ||
import logging | ||
import read_after_write as raw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highly recommend not aliasing to real words that also have other contextual meaning.
Also don't recommend importing as an improper local module
import read_after_write as raw | |
from ibl_to_nwb.brainwide_map import read_after_write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
io = NWBHDF5IO(file=h5py_file, load_namespaces=True) | ||
nwbfile = io.read() | ||
|
||
raw.test_IblSortingInterface(nwbfile, one, eid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw.test_IblSortingInterface(nwbfile, one, eid) | |
read_after_write.test_IblSortingInterface(nwbfile, one, eid) |
nwbfile = io.read() | ||
|
||
raw.test_IblSortingInterface(nwbfile, one, eid) | ||
raw.test_WheelInterface(nwbfile, one, eid) | ||
raw.test_RoiMotionEnergyInterface(nwbfile, one, eid) | ||
raw.test_BrainwideMapTrialsInterface(nwbfile, one, eid) | ||
raw.test_IblPoseEstimationInterface(nwbfile, one, eid) | ||
raw.test_LickInterface(nwbfile, one, eid) | ||
raw.test_PupilTrackingInterface(nwbfile, one, eid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The step of reading the NWB file and running the tests on it could be its own function, and we could even back it into the run_conversion
procedure so that it's checked every time and errors out if there's ever an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagined that the final format would be something like what you suggested in the #88 , I had in here just to verify quickly and easily that it all runs, to be integrated into a more proper structure later on.
Thank you for the code review. Really helpful for me! |
replaced by #88 |
not everything implemented yet and documentation missing, but putting it up here now to keep thing moving. Works in conjunction with
convert_brainwide_map_processed_only_local_testing.py