Skip to content

Raymondwchang/streaming init cnmfe attrs #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Mar 7, 2025

Conversation

raymondwjang
Copy link
Member

@raymondwjang raymondwjang commented Mar 1, 2025

📝 Description

populate transformers that initialize non-fundamental (i.e. anything besides footprints and fluorescence) matrices - sufficient statistics, residual buffers, etc.

as the way these transformers interact with components are varied - some take in footprints and output traces, some take in footprint AND traces and output pixel correlation, etc., I am adding a decorator class that works as an interface between the transformers and ComponentManager.

a little worried about how to ensure a proper plug-in with ComponentManager, as we keep on adding component measurement traits and different types of transformers.

📌 Related Issue

🔍 Type of Change

New Feature: Introducing new functionality
🧹 Refactor: Code changes that neither fix a bug nor add a feature

🚀 Implementation Details

ManagerInterface works as a decorator interface. it takes care of all interactions with ComponentManager, so that ComponentManager can focus on creating/updating/removing components, while the transformers can be decoupled from how components are structured and solely deal with getting what it needs, transforming it, and returning the results.

The current data structure is:

  1. ComponentManager manages all stored data, and works as the interface for all data manipulations
  2. Actual measurement arrays and their attributes are stored in FootprintsManager and TracesManager. (More to be added as we add noise arrays, residual arrays, etc.)
  3. The actual source (the fluorescing thing) of the measurements (footprint and trace) is either a Neuron or a Background object
  4. The source and measurement array are linked by SourceID-ArrayIndex, where IDs are generated by Python's id function.
  5. These links are dynamically managed by the ComponentManager.

The proposed workflow is:

  1. new_frame comes in
  2. it wants to reach a relevant River Transformer that will turn it into a usable data
  3. before it gets to a Transformer, it hits ManagerInterface, which wraps around the Transformer.
  4. ManagerInterface grabs bothComponentManager and new_frame
  5. ManagerInterface looks at the Transformer and determines what already-collected data it needs from ComponentManager
  6. ManagerInterface relays only the relevant data to the Transformer (i.e. FootprintInitializer)
  7. This allows the River Transformer to only do the standard learn and transform
  8. the transformed result is returned to ManagerInterface
  9. the ManagerInterface knows what kind of data the Transformer returned, and asks ComponentManager to update its components.

🧪 Testing

  • Ran unit tests
  • Ran integration tests
  • Performed manual testing
  • Updated existing tests

🛠️ Dependencies

✅ Checklist

  • My code follows the project's style guidelines
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

🔗 Additional Resources

feat: manager_interface.py now takes care of interfacing with componentmanager.
Copy link

codecov bot commented Mar 1, 2025

Codecov Report

Attention: Patch coverage is 69.16667% with 37 lines in your changes missing coverage. Please review.

Project coverage is 89.93%. Comparing base (486c0dc) to head (95ee5aa).

Files with missing lines Patch % Lines
src/cala/streaming/initialization/pixel_stats.py 0.00% 36 Missing ⚠️
...cala/streaming/initialization/manager_interface.py 98.03% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
- Coverage   91.90%   89.93%   -1.97%     
==========================================
  Files          55       57       +2     
  Lines        1420     1510      +90     
==========================================
+ Hits         1305     1358      +53     
- Misses        115      152      +37     
Flag Coverage Δ
unittests 89.93% <69.16%> (-1.97%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@raymondwjang raymondwjang marked this pull request as ready for review March 1, 2025 00:58
@raymondwjang raymondwjang marked this pull request as draft March 1, 2025 00:59
@sneakers-the-rat
Copy link
Collaborator

just read the PR description, haven't read the code yet, but one initial reaction:

The source and measurement array are linked by SourceID-ArrayIndex, where IDs are generated by Python's id function.

The id function just produces the memory location that identifies that particular python object instance. this would not be durable e.g. between processes or runs. Content addressing is good, and is what you should use here - you want to make a unique identifier that can be used to reference data in a portable way, a hash is the best answer for this. whether you want to hash the array contents or just the metadata is a design question that depends on 'which are the parts that should be used to identify the data chunk'. let me read

@raymondwjang
Copy link
Member Author

ohh that makes sense. i was wondering how it was coming up with the integer numbers

Copy link
Collaborator

@sneakers-the-rat sneakers-the-rat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not exactly sure what i am reviewing for here, this seems fine?



@dataclass
class SpatialInitializationResult:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give these a parent class - the main thing you want from Results classes is a consistent interface. so it seems like atm these are mutually exclusive complements: spatial initialization is x/y coords (right?) and temporal is timeseries. what about the other kinds of initializer types? how will you know how to combine them? will each produce a different kind of output, or will some produce overlapping types with the current result types? having a generic result class that has empty values for each of the possible kinds of things would be better than each initializer type having its own result class

def learn_one(self, components: ComponentManager, X: xr.DataArray) -> T:
"""Learn step extracts needed data from manager and passes to transformer."""
match initializer_type:
case InitializerType.SPATIAL:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah immediately see the problems from above^. if each results class is unique, you need wrappers like this. presumably different kinds of transformers take different types of data, so having one results class and then always passing that one class to the transformers seems less fragile than this

Copy link
Member Author

@raymondwjang raymondwjang Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you might be suggesting might have been what i had originally. at first there was no decorator and i was just adding/subtracting components directly inside the transformers, but i didn't like how tranformers were not only doing the raw transformations, but also taking on interfacing with the data structure.

"""

def decorator(transformer_class: Type[T]) -> Type[T]:
class ManagerWrappedTransformer(cast(type, transformer_class)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that without generic __getattr__s that forward attribute access on to the wrapped class, you're going to get into a pretty tricky spot pretty quick

spatial_axes: tuple = ("height", "width")
"""Spatial axes for pixel statistics"""

def validate(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this as a __post_init__ or you'll inevitably forget to call it (and needing to call it is a pain in the ass)

Signed-off-by: Raymond W. Chang <[email protected]>
Signed-off-by: Raymond W. Chang <[email protected]>
Signed-off-by: Raymond W. Chang <[email protected]>
Signed-off-by: Raymond W. Chang <[email protected]>
Signed-off-by: Raymond W. Chang <[email protected]>
Signed-off-by: Raymond W. Chang <[email protected]>
Signed-off-by: Raymond W. Chang <[email protected]>
Signed-off-by: Raymond W. Chang <[email protected]>
@raymondwjang raymondwjang marked this pull request as ready for review March 7, 2025 01:25
@raymondwjang raymondwjang merged commit 8695b60 into main Mar 7, 2025
0 of 2 checks passed
@raymondwjang raymondwjang linked an issue Mar 7, 2025 that may be closed by this pull request
@raymondwjang raymondwjang deleted the raymondwchang/streaming-init-cnmfe-attrs branch March 21, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interface Between Data Structure and Transformers
2 participants