Skip to content

Conversation

chkothe
Copy link
Contributor

@chkothe chkothe commented Sep 19, 2025

This fixes a problem where BrainVision files that had been renamed but were not patched by the user will fail to import.
BrainVision EEG files come in triples, as in myfile.vhdr, myfile.eeg, and myfile.vmrk.

The problem is that the .vdhr file has a textual reference to the sidecar .vmrk and .eeg files, as in the following example:

File l3_s02b12c01.vmrk:

[Common Infos]
Codepage=UTF-8
DataFile=l3_s02b12c01.eeg
MarkerFile=l3_s02b12c01.vmrk

... but when users rename all 3 files (eg to fix a mistyped subject/session name or etc.), the resulting files can no longer be imported with Neo, unless the user also manually edits the .vhdr file and fixes the reference.

This is the same feature also found in some other BV importers, e.g. the one in EEGLAB.

This is unfortunately somewhat common with published datasets (e.g. on OpenNeuro) where users often rename their files for publication but sometimes fail to make the textual edit of their vhdr files.

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR! This might be related to our integrity checks discussion in #1773.

So if I understand the code correctly you want to go from the assumption that the files are appropriately described in the metadata to that the user renamed in triplet fashion. (ie file was originally called myfile.vhdr, myfile.eeg, myfile.vmrk and then I renamed it to mycoolfile.vhdr, mycoolffile.eeg, mycoolfile.vmrk. But if I did mycoolfile.vhdr, myothercoolfile.eeg, myfinalcoolfile.vmrk this code would fail.

I think at minimum we should provide an else branch to this loop that clearly errors and explains why the error is happening to give the user the opportunity to appropriately name the files.

The other approach would be to treat this a directory approach and just look for any three triplets in the same folder and try to load those. I guess I also like this approach in general if we decide to allow it. I'll probably ping some other people as we have been having debates about flexibility for the user vs letting the user combine inappropriate files. I would tend to vote yes to allow this but let me follow up too!

marker_filename = self.filename.replace(bname, vhdr_header["Common Infos"]["MarkerFile"])
binary_filename = self.filename.replace(bname, vhdr_header["Common Infos"]["DataFile"])

binary_filename = self._ensure_filename(binary_filename, "binary")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the bhdr header it calles this the "DataFile" should the "kind" here be "data"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@chkothe
Copy link
Contributor Author

chkothe commented Sep 26, 2025

Thanks for the quick review -- and sorry, just saw your reply. So the code as-is is a best-effort way to do a better job in a common use case that makes the importer work seamlessly in that case. If the user renames their files in a mismatched way, then the new code steps aside and does intentionally nothing (returning the original filename back to the import codepath), and the current behavior is preserved in that case, in that the importer will crash with a Python file-not-found error, referring to whatever the original filename was in the .vhdr file.

Since this adds no new regressions and improves parity with other BV importers that have the same feature (including EEGLAB, FieldTrip, brainvisionloader.jl, and BioSig, to name a few) I'd argue it's a self-contained improvement that's orthogonal to better error messages when things fail to resolve for any number of reasons (currently the error in that case will be the same that it's always been). I think Neo could have fantastic error messages for missing sidecar files, and they could be made consistent across importers (there are some 10-15 neuro file formats that would fall under this, not all currently in neo), but it'd be a longer discussion that'd risk stalling immediate improvements like this one for potentially a long time.

As for extended heuristics to tie together potentially mismatched files in the same folder, I think that'd be out of scope for this one and would be a broader discussion. For the record, no existing BV importer implements such a heuristic and it's potentially dangerous as one can easily end up with mismatched files in a folder as a result of copying the wrong stretch of 3 files, or as a result of a typo during a rename.

Hope we can get this one in meanwhile! We're currently blocked since neo doesn't import all studies cleanly and if it gets stalled we'd have to move to a different importer (or roll our own) that'd be unfortunate and a waste of everyone's time. Thanks for looking into it!

@zm711
Copy link
Contributor

zm711 commented Sep 26, 2025

I'm definitely okay with this. My request for a different error was mostly because merging this gives us two behaviors:

  1. default: use file names from metadata
  2. same name files: automagically work

But this behavior is not documented for the user (it will log if it is happening but how will users discover this okay?) If we raise an error with a branch we can make that clear. If you feel strongly about not having an else for now and using the default could you add a note into the doctoring under a notes section to explain this behavior. While you're doing that let me try to figure out why the docs are misbehaving.

@chkothe (maybe if I tag you the alert will work better? )

@chkothe
Copy link
Contributor Author

chkothe commented Sep 26, 2025

Yeah, the alert works -- thanks!

So as for documenting, I think we can totally do that. Happy to add a line to the docstring. As for users discovering that the automagic behavior was triggered for their data, we have a log -- currently a warning but could be an info since most (I think all) other BV importers do that entirely silently, and several even ignore the MarkerFile/DataFile entries altogether, and users may be spooked by a warning.

What did you mean by error? So an error would/should only happen in case of actually mismatched files. Right now, it'll say these files are missing (as it did before), and yes, users may be puzzled. So we could add an else and note that the referenced file seems to be missing and that may be because of a mismatch of what's in the VHDR file and how their files are named on disk. I suspect there could be cases where the exists() will return a False but the open() will actually work, e.g. perhaps some network file systems (not uncommon in labs), which is why I didn't want to add a potentially noisy/spurious warning (or worse, error!) triggered of an exists() check. So we could have a warning there, which will be followed by an error if the subsequent open also fails. Then users then can inspect the logs to figure out why the open might have failed.

Alternatively there could be a try/catch around the two places in the main import codepath where it actually attempts to open the file (thus unrelated to what this commit does) that intercepts the errors and injects some additional suggestions to the user as to why that may have happened, and re-raises. I think that may be the somewhat better UX since then users don't have to realize that there's a warning before the error. I wasn't planning to, but could add those and come up with some error messages.

@zm711
Copy link
Contributor

zm711 commented Sep 26, 2025

@chkothe,

So as for documenting, I think we can totally do that. Happy to add a line to the docstring. As for users discovering that the automagic behavior was triggered for their data, we have a log -- currently a warning but could be an info since most (I think all) other BV importers do that entirely silently, and several even ignore the MarkerFile/DataFile entries altogether, and users may be spooked by a warning.

Sorry I was unclear. I think the log approach is fine for doing the new behavior. It warns users that we are relying on the fact they named the files correctly themselves. So I don't think that part needs to be changed.

What did you mean by error? So an error would/should only happen in case of actually mismatched files. Right now, it'll say these files are missing (as it did before), and yes, users may be puzzled. So we could add an else and note that the referenced file seems to be missing and that may be because of a mismatch of what's in the VHDR file and how their files are named on disk. I suspect there could be cases where the exists() will return a False but the open() will actually work, e.g. perhaps some network file systems (not uncommon in labs), which is why I didn't want to add a potentially noisy/spurious warning (or worse, error!) triggered of an exists() check. So we could have a warning there, which will be followed by an error if the subsequent open also fails. Then users then can inspect the logs to figure out why the open might have failed.

So I envisioned:
if we find the files from the vhdr do the normal process
elif we find the files with the same name we log the fact that you load those files even though they don't have the name found in the vhdr

Then we add a final branch (not currently in this PR) to say

else
The files in this folder do not match the files indicated in the vhdr nor do they have the same name. Are you sure these are the correct combination of files. If you want to load them make sure their naming is either as indicated in the vhdr or that all three files have the name name before their suffix.

But you're worried this strategy might be noisy? I'm more worried that the user will say "what do you mean the file is missing I have a file in the folder." So what I would prefer is instead of having a confusing error we try to provide the user with a solution by saying we can do default behavior or same file name behavior, but we will not arbitrarily load any three files. Again if in your experience you think that will lead to a lot of noise then at minimum we can document the behavior of for users.

Alternatively there could be a try/catch around the two places in the main import codepath where it actually attempts to open the file (thus unrelated to what this commit does) that intercepts the errors and injects some additional suggestions to the user as to why that may have happened, and re-raises. I think that may be the somewhat better UX since then users don't have to realize that there's a warning before the error. I wasn't planning to, but could add those and come up with some error messages.

No try/except that can be expensive so we avoid that for neo, in general. I would maintain your current strategy. Either with better documentation in the docstring to indicate the two possible ways this IO opens files or by having an actionable error message rather than a message that might be confusing.

@chkothe
Copy link
Contributor Author

chkothe commented Sep 26, 2025

Ok, sounds good. What I was referring to are edge cases where exist() can fail while open() succeeds. Here are a few:

  • on NFS, cloud-mounted file systems or other virtual file systems, exists() can fail since what it accesses (stat/getattr) can come from a lazy cache or stale metadata attributes, while the open() usually forces a check or fetch from the server since that's where the rubber hits the road.
  • Some of these file systems also materialize (fetch) files on demand on open, in which case also exists() will fail and open() will succeed.
  • There are a couple of crazy circumstances on Windows when using special paths (those starting with \\), where the underlying stat call can fail but the open can still succeed.

So in those cases the user can get a spurious extra warning or error that they're currently not getting (as in, the load currently succeeds). Normally these scenarios are rare but with large raw-data stores on a shared server that may be more common than one might assume. In the worst case it's just a false warning, so that'll be fairly benign (if the message is worded in tentative terms, as in "It appears that..." or etc), and as long as you're comfortable with having that and favor it over potentially a more complicated error message on failure, that'd be ok with me. So I'll go ahead with the warning unless I hear otherwise (should have that later today).

Btw re the try/except, totally get you there that that's a policy.

@chkothe
Copy link
Contributor Author

chkothe commented Sep 27, 2025

Added that error condition and message. There are really two cases; (1) where the fallback name differs from the name referenced in the header (e.g., header file got renamed), and (2) where they're the same, i.e, the heuristic did not suggest a different file name (likely one of the 3 files is simply missing or only that file was renamed). The code prints a shorter error message in the latter case to make it less confusing.

(2) shorter message:

ERROR (neo.rawio.brainvisionrawio.BrainVisionRawIO) Did not find the marker file associated with .vhdr (header) file 'sub-01_ses-1_task-appleseedexample_eeg.vhdr' in folder '/home/christian/data/OpenNeuro/ds003710-download/sub-01/ses-1/eeg'.
  Please make sure the file is named the same way as the .vhdr file, but ending in .vmrk (i.e. sub-01_ses-1_task-appleseedexample_eeg.vmrk).
  The import will likely fail, but if it goes through, you can ignore this message (the check can misfire on networked file systems).

(1) longer message:

ERROR (neo.rawio.brainvisionrawio.BrainVisionRawIO) Did not find the marker file associated with .vhdr (header) file 'sub-01_ses-1_task-appleseedexample_eeg.vhdr' in folder '/home/christian/data/OpenNeuro/ds003710-download/sub-01/ses-1/eeg'.
  Please make sure the file is named either as per the MarkerFile=subjAKtask2.vmrk line in the .vhdr file, or is named the same way as the .vhdr file, but ending in .vmrk (i.e. sub-01_ses-1_task-appleseedexample_eeg.vmrk).
  The import will likely fail, but if it goes through, you can ignore this message (the check can misfire on networked file systems).

Hope that addresses your suggestion!

@zm711
Copy link
Contributor

zm711 commented Sep 27, 2025

Seems good to me. I'll re-read it carefully tomorrow and then if I don't find anything merge it then!

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

@zm711 zm711 merged commit d9f891e into NeuralEnsemble:master Sep 28, 2025
3 checks passed
@zm711
Copy link
Contributor

zm711 commented Sep 28, 2025

@chkothe I see you're already a contributor, but let me know if you want me to update you on our author list!

@zm711 zm711 added this to the 0.14.3 milestone Sep 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants