-
Notifications
You must be signed in to change notification settings - Fork 261
fix handling of broken file references in brainvisionrawio #1780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix handling of broken file references in brainvisionrawio #1780
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR! This might be related to our integrity checks discussion in #1773.
So if I understand the code correctly you want to go from the assumption that the files are appropriately described in the metadata to that the user renamed in triplet fashion. (ie file was originally called myfile.vhdr, myfile.eeg, myfile.vmrk and then I renamed it to mycoolfile.vhdr, mycoolffile.eeg, mycoolfile.vmrk. But if I did mycoolfile.vhdr, myothercoolfile.eeg, myfinalcoolfile.vmrk this code would fail.
I think at minimum we should provide an else branch to this loop that clearly errors and explains why the error is happening to give the user the opportunity to appropriately name the files.
The other approach would be to treat this a directory approach and just look for any three triplets in the same folder and try to load those. I guess I also like this approach in general if we decide to allow it. I'll probably ping some other people as we have been having debates about flexibility for the user vs letting the user combine inappropriate files. I would tend to vote yes to allow this but let me follow up too!
neo/rawio/brainvisionrawio.py
Outdated
marker_filename = self.filename.replace(bname, vhdr_header["Common Infos"]["MarkerFile"]) | ||
binary_filename = self.filename.replace(bname, vhdr_header["Common Infos"]["DataFile"]) | ||
|
||
binary_filename = self._ensure_filename(binary_filename, "binary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the bhdr header it calles this the "DataFile" should the "kind" here be "data"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
Thanks for the quick review -- and sorry, just saw your reply. So the code as-is is a best-effort way to do a better job in a common use case that makes the importer work seamlessly in that case. If the user renames their files in a mismatched way, then the new code steps aside and does intentionally nothing (returning the original filename back to the import codepath), and the current behavior is preserved in that case, in that the importer will crash with a Python file-not-found error, referring to whatever the original filename was in the .vhdr file. Since this adds no new regressions and improves parity with other BV importers that have the same feature (including EEGLAB, FieldTrip, brainvisionloader.jl, and BioSig, to name a few) I'd argue it's a self-contained improvement that's orthogonal to better error messages when things fail to resolve for any number of reasons (currently the error in that case will be the same that it's always been). I think Neo could have fantastic error messages for missing sidecar files, and they could be made consistent across importers (there are some 10-15 neuro file formats that would fall under this, not all currently in neo), but it'd be a longer discussion that'd risk stalling immediate improvements like this one for potentially a long time. As for extended heuristics to tie together potentially mismatched files in the same folder, I think that'd be out of scope for this one and would be a broader discussion. For the record, no existing BV importer implements such a heuristic and it's potentially dangerous as one can easily end up with mismatched files in a folder as a result of copying the wrong stretch of 3 files, or as a result of a typo during a rename. Hope we can get this one in meanwhile! We're currently blocked since neo doesn't import all studies cleanly and if it gets stalled we'd have to move to a different importer (or roll our own) that'd be unfortunate and a waste of everyone's time. Thanks for looking into it! |
I'm definitely okay with this. My request for a different error was mostly because merging this gives us two behaviors:
But this behavior is not documented for the user (it will log if it is happening but how will users discover this okay?) If we raise an error with a branch we can make that clear. If you feel strongly about not having an else for now and using the default could you add a note into the doctoring under a notes section to explain this behavior. While you're doing that let me try to figure out why the docs are misbehaving. @chkothe (maybe if I tag you the alert will work better? ) |
Yeah, the alert works -- thanks! So as for documenting, I think we can totally do that. Happy to add a line to the docstring. As for users discovering that the automagic behavior was triggered for their data, we have a log -- currently a warning but could be an info since most (I think all) other BV importers do that entirely silently, and several even ignore the MarkerFile/DataFile entries altogether, and users may be spooked by a warning. What did you mean by error? So an error would/should only happen in case of actually mismatched files. Right now, it'll say these files are missing (as it did before), and yes, users may be puzzled. So we could add an else and note that the referenced file seems to be missing and that may be because of a mismatch of what's in the VHDR file and how their files are named on disk. I suspect there could be cases where the Alternatively there could be a try/catch around the two places in the main import codepath where it actually attempts to open the file (thus unrelated to what this commit does) that intercepts the errors and injects some additional suggestions to the user as to why that may have happened, and re-raises. I think that may be the somewhat better UX since then users don't have to realize that there's a warning before the error. I wasn't planning to, but could add those and come up with some error messages. |
Sorry I was unclear. I think the log approach is fine for doing the new behavior. It warns users that we are relying on the fact they named the files correctly themselves. So I don't think that part needs to be changed.
So I envisioned: Then we add a final branch (not currently in this PR) to say else But you're worried this strategy might be noisy? I'm more worried that the user will say "what do you mean the file is missing I have a file in the folder." So what I would prefer is instead of having a confusing error we try to provide the user with a solution by saying we can do default behavior or same file name behavior, but we will not arbitrarily load any three files. Again if in your experience you think that will lead to a lot of noise then at minimum we can document the behavior of for users.
No try/except that can be expensive so we avoid that for neo, in general. I would maintain your current strategy. Either with better documentation in the docstring to indicate the two possible ways this IO opens files or by having an actionable error message rather than a message that might be confusing. |
Ok, sounds good. What I was referring to are edge cases where
So in those cases the user can get a spurious extra warning or error that they're currently not getting (as in, the load currently succeeds). Normally these scenarios are rare but with large raw-data stores on a shared server that may be more common than one might assume. In the worst case it's just a false warning, so that'll be fairly benign (if the message is worded in tentative terms, as in "It appears that..." or etc), and as long as you're comfortable with having that and favor it over potentially a more complicated error message on failure, that'd be ok with me. So I'll go ahead with the warning unless I hear otherwise (should have that later today). Btw re the try/except, totally get you there that that's a policy. |
Added that error condition and message. There are really two cases; (1) where the fallback name differs from the name referenced in the header (e.g., header file got renamed), and (2) where they're the same, i.e, the heuristic did not suggest a different file name (likely one of the 3 files is simply missing or only that file was renamed). The code prints a shorter error message in the latter case to make it less confusing. (2) shorter message:
(1) longer message:
Hope that addresses your suggestion! |
Seems good to me. I'll re-read it carefully tomorrow and then if I don't find anything merge it then! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me.
@chkothe I see you're already a contributor, but let me know if you want me to update you on our author list! |
This fixes a problem where BrainVision files that had been renamed but were not patched by the user will fail to import.
BrainVision EEG files come in triples, as in
myfile.vhdr
,myfile.eeg
, andmyfile.vmrk
.The problem is that the
.vdhr
file has a textual reference to the sidecar.vmrk
and.eeg
files, as in the following example:File
l3_s02b12c01.vmrk
:... but when users rename all 3 files (eg to fix a mistyped subject/session name or etc.), the resulting files can no longer be imported with Neo, unless the user also manually edits the
.vhdr
file and fixes the reference.This is the same feature also found in some other BV importers, e.g. the one in EEGLAB.
This is unfortunately somewhat common with published datasets (e.g. on OpenNeuro) where users often rename their files for publication but sometimes fail to make the textual edit of their vhdr files.