Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProbeConfiguration is deserializing ChannelConfiguration to XML #366

Open
bruno-f-cruz opened this issue Nov 5, 2024 · 17 comments
Open
Assignees
Milestone

Comments

@bruno-f-cruz
Copy link

Hi!

Not sure if this is intended, but when adding a NeuropixelsV2eHeadstage node to the workflow, a large amount of data (>1Mb) is added to the .bonsai file.

Is the deserialization of this field really required? Having it this way, makes it extremely difficult to maintain code in GitHub due to its sheer size but also when it comes to go over diff checks.

For reference I am attaching the bonsai.config to reproduce the environment and a simple snippet that reproduces the behavior i am looking at.

ExampleHeadstageOnix.zip

@jonnew
Copy link
Member

jonnew commented Nov 5, 2024

@bparks13 can you remind me why we convert to Base64-encoded JSON object for ProbeGroup properties? Im fairly sure there was at least some real reason for this.

  • In any case, we have 2x 384-channel device with each channel having some properties. Something like 1kB per channel is going to be needed to write this down in text without any type of compression or binary formatting. So, 2 * 400 * 1kB is on the order of 1MB in any case.
  • For diffs, this encoding is going to be bad. That's true.

@glopesdev
Copy link
Collaborator

Yeah, in general we do tend to avoid storing dense configuration data in the workflow itself and usually offload them to external config files, e.g. camera configuration files, projector calibration data, etc.

Probably the easiest here might be to make the property option point to an external config file. The built-in GUI would remain the same, but now all configuration changes would be saved and loaded from the external config file. Advantages of this explicit approach is to make it easier for people to swap config files and inspect where data is coming from.

@jonnew
Copy link
Member

jonnew commented Nov 5, 2024

That make sense from a development standpoint, but from a user perspective, it makes plug and play use of bonsai more difficult. It means that if the config file and workflow become separated, then they will lose track of their configurations. Unless there is some way to tie the files together (e.g. bonsai files become a zip archive) we probably cannot pursue this option.

In fact, the Design library provides functions for saving the probe configuration as a json, so its already there. We can discuss if this is such an issue we want to go down this road, but Im very worried about anything that makes the user experience more difficult.

@glopesdev
Copy link
Collaborator

glopesdev commented Nov 5, 2024

The issue here is a tradeoff between size of config files and workflow files. People using bonsai will be used to relying on external files for different things depending on the context, there simply is no escaping it in almost any software. There are many things which by necessity need to be saved outside of a bonsai workflow (e.g. kernel drivers, firmware versions, NPX probe gain files, camera configuration, etc).

Bundling multiple files into a single zip file is always possible, but not very reusable, and would still not save us from operating system level config files.

I'm not sure why people would immediately lose track of their configurations if they are separate from the workflow, or at least I can't easily see why they wouldn't lose track of them anyway even if it is in the workflow. If the built-in GUI can seamlessly edit these files from within bonsai then they would be able to correct and assign the config as part of their single workflow without having to think too hard about it.

I'm happy to discuss what would make sense here, just bear in mind that a .bonsai file plays an important double role as both code and config.

Not sure what @bruno-f-cruz 's plan is, but as a heavy user of versioning and diff's I guess if I need to use this particular function what I would do is hide this property behind a custom include workflow, never externalize the property outside (to make sure it is not saved in a .bonsai file by accident) and then dynamically generate and populate the property from an external config file anyway via a subject or something else.

@glopesdev
Copy link
Collaborator

@bruno-f-cruz oh wow, sorry I hadn't even opened the workflow yet, I guess I see what you mean now, this basically detonates any information content in the workflow.

screenshot

I mean, not even sure if the VS code XML parser can lint this file, as even syntax highlighting seems to be compromised.

@jonnew
Copy link
Member

jonnew commented Nov 5, 2024

Hey guys, when adding a comment here, make sure you are trying to propose a solution rather than just pointing out issues as its likely that those issues are a result of a real, or at least perceived, motivation.

IMO

  • The point about calibration files needing to travel with the workflow is well taken and maybe we should just automatically serialize the probe configuration to some json next to the calibration files without even asking.
  • The point about drivers etc. is a bit of a false equivalence. These files are generally going to form a fairly static context workflows to run in. A single configuration file in in this context is a worthy goal that I feel will reduce issues with meta data storage.
  • I am not sure why we used Base64 encoding rather than just serializing the configuration objects to xml. It might be due to some of the limitations of XML serialization library being used.
  • As with the Icons, there are not many hard rules I've seen for what to include and what not to include in a workflow. Given that these parameters (barring their Base64 encoding) describe properties of software elements (as opposed to e.g. the camera calibration which just goes right to hardware), I don't see a categorical line between these properties and properties that are routinely stored in bonsai workflows other than that it makes the file big.

Solutions:

  • At minimum, find a way around using Base64
  • Consider secondary probe configuration file, perhaps that is autogenerated with the configuration in the .bonsai being replaced with path.

@bruno-f-cruz
Copy link
Author

Thanks y'all for addressing this so fast. Here are some points I am still a bit confused about and I will also expand on my use case.

In any case, we have 2x 384-channel device with each channel having some properties. Something like 1kB per channel is going to be needed to write this down in text without any type of compression or binary formatting. So, 2 * 400 * 1kB is on the order of 1MB in any case.

As @glopesdev mentioned, I would honestly prefer this to be a external config file (more on this later). However, even if this is not possible, I still fail to understand why is a configuration serialized to the XML before even having selected any probe. Wouldnt it be possible to leave the property unset until the user actually selects a configuration file in the GUI? Also, what is configured in this data? Is it likely to ever change?

That make sense from a development standpoint, but from a user perspective, it makes plug and play use of bonsai more difficult. It means that if the config file and workflow become separated, then they will lose track of their configurations. Unless there is some way to tie the files together (e.g. bonsai files become a zip archive) we probably cannot pursue this option.

In my use case I believe this decoupling is a feature not a bug. In broad strokes, the goal is to have a repository in github with a single workflow that is shared across several different rigs (the plan is to eventually scale to a few dozen). The way we have been handling this pattern is by having a domain-specific language defined for the rig configuration. For instance, this includes serial numbers of cameras expected to be found in the rig, COM ports of devices, calibration settings, etc... After looking at the library, I feel this kind of pattern might be really hard to achieve since most properties appear to be hidden behind graphical user interfaces that I cant access from the workflow.

Given the current solution I can see three alternatives:

  • Have a different workflow per box and make the path of that workflow part of the DSL. This would have the huge drawback that the size of the repository scales with the number of boxes, and each change to the workflow would have to be propagated across N workflows that have duplicated code.
  • Remove all operators that are rig specific and instruct users to "add the missing ingredients" operators after cloning the repo to each computer. This is also sub optimal since the deployment would now require a manual step.
  • Fork the repository and maintain a version of the operators we need with externalized properties. I think this is probably the worst solution for an open-project that is full of potential like this one.

. Unless there is some way to tie the files together (e.g. bonsai files become a zip archive) we probably cannot pursue this option.

If this is a problem for users, wouldnt having a sane default that saves the configuration file to the cwd with a default name be sufficient? If no config is provided, load that one, if one is set, override.

As with the Icons, there are not many hard rules I've seen for what to include and what not to include in a workflow. Given that these parameters (barring their Base64 encoding) describe properties of software elements (as opposed to e.g. the camera calibration which just goes right to hardware), I don't see a categorical line between these properties and properties that are routinely stored in bonsai workflows other than that it makes the file big.

I agree this distinction is not clear. However, in most cases (I actually cant come up with a counter-example from the top of my head) large properties (e.g. arrays) are not automatically initialized when an operator is added. Instead, when selected via a GUI, for instance the Crop operator, they dump their values to the property of the operator. This affords skipping serialization when the property is yet to be initialized but also allows users to force the value of the property via an ExternalizedProperty.

@jonnew
Copy link
Member

jonnew commented Nov 5, 2024

I must go but just a quick though if we go the external file route: what do we do about multiple probes in the same workflow (config file for each based on serial number? Large signal config with multiple probe sn's inside?)

@bruno-f-cruz
Copy link
Author

I must go but just a quick though if we go the external file route: what do we do about multiple probes in the same workflow (config file for each based on serial number? Large signal config with multiple probe sn's inside?)

In my mind this is exactly why you would make the property externalizable. People could select the file to use in that operator. If you are trying to make this assignment solely driven by the GUI, I think your solution would work with the externalized property too. Each time you open the UI dump a file with the probe SN to the cwd (foo_calib_.bin) and sets the externalized property to this path. Not sure if this makes complete sense as it is not super clear what the multiple config files are used for, but happy to discuss if you want to brainstorm about it!

@glopesdev
Copy link
Collaborator

glopesdev commented Nov 5, 2024

These files are generally going to form a fairly static context workflows to run in. A single configuration file in in this context is a worthy goal that I feel will reduce issues with meta data storage.

@jonnew I agree with this context, and I guess my feeling is to have a single "probe configuration file" property which could contain everything there is to configure about a probe, and could have its own JSON-based or binary format. The gain calibration file itself could either be referenced by path or name from this file, or have its contents directly embedded in the file, whichever is more convenient.

(One important consideration here: if we want/need to reuse multiple channel maps for different gain calibration files, then it might be useful to expose two properties anyway)

When opening the GUI for the node the dialog could provide options for creating or loading these files if they are not already set. I assume these configuration files must be created before running the experiment. Having the probe SN validated against either the file name or the file contents should prevent against accidentally losing track of files.

what do we do about multiple probes in the same workflow (config file for each based on serial number? Large signal config with multiple probe sn's inside?)

I can think of a few cases here:

  • multiple probes in the same workflow (each with a single config file)
  • multiple config files for a single probe (allow toggle between probe config files at boot time)
  • multiple config files for multiple probes (allow toggle between probe config files for all probes at boot time)

If the headstage provides two properties ProbeConfigurationFileA and ProbeConfigurationFileB, then all these options should be possible.

@jonnew jonnew added this to the 0.5.0 milestone Nov 5, 2024
@bparks13
Copy link
Member

bparks13 commented Nov 6, 2024

RE: Base64 conversion. To maintain compatibility with the ProbeInterface specification, OpenEphys.ProbeInterface.NET uses JSON for serialization/deserialization. By using JSON we could have private setters for properties and point the serializer/deserializer to constructors, but I could not find an easy way to do this via the XML converter Bonsai uses to save config files. Specifically, the XML file would fail to write without public setters for all properties. The workaround we collectively decided on is to use Base64 to convert the JSON string we already have support for into a string that does not have special characters that might conflict with the XML file format.

As @jonnew mentioned, the initial consideration for this is ease of use. When the node is first placed in the workflow, the default channel configuration is automatically generated, so that if the user simply wants to use the defaults without any modification they can place whichever node it is and immediately press play without any issue. if there is a need to change which channels are being recorded from, opening up the GUI allows for that, as well as changing the other properties either through the GUI or via the property pane.

For users with multiple configurations, in the GUI we already have support for loading/saving channel configuration files. The reason being is that it seemed simpler on the user to keep the current configuration embedded in the .config file, and then have external config files for switching contexts if needed.

@bparks13
Copy link
Member

bparks13 commented Nov 6, 2024

@bruno-f-cruz we discussed this proposal in our weekly meeting and we definitely agree this is the right direction to go by externalizing the configuration files. Can you put together an initial implementation for us to review?

Some specifics that we discussed based on the proposed ideas are:

  • We think it should be one file per serial number, and this filename should be saved as a property
  • The naming scheme for the files should be cohesive (i.e., $SN_probe_interface.json as an example).
  • If a file exists for the current SN in cwd, this should be automatically loaded
  • Any changes made in the GUI should be saved to the configuration file, which is then later loaded to apply the specific configurations to the device when the workflow starts

@bruno-f-cruz
Copy link
Author

@bparks13 I am trying to write a proof of concept of how I would like it to look like but i am struggling with debugging without constant access to hardware. I seem to remember that there was a driver that one could use, is this public / are there instructions on how to use it?

@bparks13
Copy link
Member

bparks13 commented Nov 7, 2024

@bruno-f-cruz as far as I am aware the test driver is not currently useable in the Bonsai library, and I don't remember there being any functionality for Neuropixels embedded in it so I don't think it would help in your circumstances. Sorry about that!

@jonnew or @aacuevas please correct me if I am wrong.

@jonnew
Copy link
Member

jonnew commented Nov 7, 2024

@bruno-f-cruz -- why do you need access to hardware?

@bruno-f-cruz
Copy link
Author

Probably a skill issue on my part. I am not familiar with all the code base, so following the call stack as the application runs is very useful for me to understand what is going on. I agree that in theory a better developer could probably do it without access to hardware but that is not something I could pull off right now. Meantime I will keep trying and isolating the hardware-independent functionality. Thanks for checking anyway!

@jonnew
Copy link
Member

jonnew commented Nov 7, 2024

OK. For this, you should be able to do essentially everything (defining configurations, saving them, loading them, etc) without touching hardware. That includes putting in breakout points, running workflows etc.

All of this info is eventually translated into a bunch of i2c interactions that program registers on the neuropixels, but I dont think that part is going to be modified at all from these changes.

Feel free to setup a time with Brandon to go over strategies for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants