Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor stream resource producing code into a separate class #290

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

skarakuzu
Copy link
Contributor

@skarakuzu skarakuzu commented May 8, 2024

Closes #173

from ._hdfdataset import _HDFDataset

@dataclass
class _HDFDataset:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @coretl:

This should be slimmed to only what's needed to describe a file, not what's needed to write it.

The pattern detector is different from the others because it is actually writing the HDF5 file, not describing a file written by an external program (IOC). For the pattern detector, separate dataclass should extend this to add maxshape and fillvalue.

With that separtion, we can make all of these fields required fields.


Some of this information is used to construct the EventDescriptor (shape, dtype, name ?, multiplier). Other information is used to construct the StreamResource (path, meaning the "path" of the dataset inside the layout of the HDF5 file, distinct from the file; and optionally smwr).

Add dtype_str, the numpy-style dtype like "<i4" or "<f8".

This should also include the uri. (For old-style StreamResource we can convert this into root and resource_path by ignoring the hostname, hard-coding root to "/" as discussed, and using the rest of the path as resource_path.)

https://github.com/bluesky/bluesky/blob/bc8222be2e099a00baafcede1d761d61b213bf19/src/bluesky/callbacks/tiled_writer.py#L167-L178

@danielballan danielballan added this to the 0.4 milestone Jun 4, 2024
@skarakuzu skarakuzu force-pushed the Factor-StreamResource-producing-code-into-a-separate-class branch from 86d7f97 to 2ac7873 Compare June 4, 2024 20:05
@skarakuzu skarakuzu marked this pull request as ready for review June 5, 2024 20:51
bundler_composer(
mimetype="application/x-hdf5",
uri=uri,
data_key=ds.name.replace("/", "_"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which would make this:

Suggested change
data_key=ds.name.replace("/", "_"),
data_key=ds.data_key,

I would make data_keys with / in them an error, not a silent replacement

Copy link
Contributor Author

@skarakuzu skarakuzu Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question regarding the removal of replace("/", "_") . If I remove this I have to replace DATA_PATH = /entry/data/data with DATA_PATH = _entry_data_data in the pattern_generator.py and so on. If not I get a regex not matching error from the bluesky runengine . I tentatively replaced the paths this way in the code but would like to ask for suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are different things, data_key is the name within the descriptor, and dataset is the path within the HDF file. So for pattern generator I think we want something like:

            _HDFDataset(
                data_key=f"{detector_name}-data,
                dataset=DATA_PATH,
            )

I'm not sure quite how you pass the detector name through and into the pattern generator though...

Copy link
Contributor Author

@skarakuzu skarakuzu Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pattern_generator.py, PATH parameters were assigned to name (now data_key). What I meant is that, deleting the replace("/", "_") resulted in errors so I had to modify the PATH parameters. The other thing is that, I am not sure I understand passing detector name to _HDFDataset since I am not experienced with the usage of this class. Is a device going to assign the parameters in this class? I would be happy to have more context.

uri=uri,
data_key=ds.name.replace("/", "_"),
parameters={
"path": ds.path,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this would be:

Suggested change
"path": ds.path,
"dataset": ds.dataset,

@skarakuzu skarakuzu force-pushed the Factor-StreamResource-producing-code-into-a-separate-class branch from 9f50cb7 to 2aaab09 Compare June 11, 2024 14:13
@abbiemery abbiemery removed this from the 0.4 milestone Jun 17, 2024
@subinsaji subinsaji changed the base branch from main to dev June 18, 2024 09:42
@subinsaji subinsaji changed the base branch from dev to main June 18, 2024 09:51
Comment on lines +19 to +31
#: Name of the data_key within the Descriptor document
data_key: str
dtype_numpy: Optional[str] = None
swmr: bool = False
shape: Optional[List[int]] = None
multiplier: Optional[int] = 1
#: Name of the dataset within the HDF file
dataset: Optional[str] = None
device_name: Optional[str] = None
block: Optional[str] = None
maxshape: tuple[Any, ...] = (None,)
dtype: Optional[Any] = None
fillvalue: Optional[int] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#: Name of the data_key within the Descriptor document
data_key: str
dtype_numpy: Optional[str] = None
swmr: bool = False
shape: Optional[List[int]] = None
multiplier: Optional[int] = 1
#: Name of the dataset within the HDF file
dataset: Optional[str] = None
device_name: Optional[str] = None
block: Optional[str] = None
maxshape: tuple[Any, ...] = (None,)
dtype: Optional[Any] = None
fillvalue: Optional[int] = None
#: Name of the data_key within the Descriptor document
data_key: str
# Numpy dtype representation of the type
dtype_numpy: str
# Is the h5 file written in SingleWriter, MultipleRead mode
swmr: bool = False
# Shape of a frame of the h5 file
shape: List[int] = []
# How many frames per Stream Datum index
multiplier: int = 1
#: Name of the dataset within the HDF file
dataset: str = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that looks right to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explaining!


# pixel sum path
SUM_PATH = "/entry/sum"
SUM_PATH = "_entry_sum"

MAX_UINT8_VALUE = np.iinfo(np.uint8).max

SLICE_NAME = "AD_HDF5_SWMR_SLICE"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@dataclass
class PatternDataset(_HDFDataset):
maxshape: tuple[Any, ...] = (None,)
fillvalue: Optional[int] = None
dtype: Optional[type] = None # Or whatever this was on the previous class

def get_full_file_description(
datasets: List[DatasetConfig], outer_shape: tuple[int, ...]
datasets: List[_HDFDataset], outer_shape: tuple[int, ...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
datasets: List[_HDFDataset], outer_shape: tuple[int, ...]
datasets: List[PatternDataset], outer_shape: tuple[int, ...]

@@ -255,24 +173,24 @@ def _get_new_path(self, directory: DirectoryProvider) -> Path:
new_path: Path = info.root / info.resource_dir / filename
return new_path

def _get_datasets(self) -> List[DatasetConfig]:
raw_dataset = DatasetConfig(
def _get_datasets(self) -> List[_HDFDataset]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_datasets(self) -> List[_HDFDataset]:
def _get_datasets(self) -> List[PatternDataset]:

def _get_datasets(self) -> List[DatasetConfig]:
raw_dataset = DatasetConfig(
def _get_datasets(self) -> List[_HDFDataset]:
raw_dataset = _HDFDataset(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raw_dataset = _HDFDataset(
raw_dataset = PatternDataset(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Factor StreamResource producing code into a separate class
6 participants