-
Notifications
You must be signed in to change notification settings - Fork 1
Description
A device can have multiple streams which should be uniquely identified by their reader.pattern
attribute. This is a strong assumption of our current general load
strategy, and is reflected in the way chunks are probed from data directories:
Lines 89 to 93 in 30f21ed
fileset = { | |
chunk_key(fname): fname | |
for path in root | |
for fname in Path(path).glob(f"{epoch_pattern}/**/{reader.pattern}.{reader.extension}") | |
} |
Specifically, note that fileset
is a dictionary indexed by chunk_key
, which means that duplicate files with the same chunk time will override previously found entries.
This behavior is important to the way data overrides currently work. For example, the processed
root folder can override an existing chunk file using essentially this mechanism, as any overlapping files in later roots will supersede files found in preceding roots.
However, this does cause some confusion with pose reader files, as in this case the reader.pattern
uses a wildcard match which is more general than individual file naming patterns in order to match against the dynamic model config path which is included as part of the filename.
We should discuss more generally what the desired behavior should be in this case. I feel there are two general possible strategies:
- Rewrite the
load
function to include thename
of the file as an additional key. This should still allow overriding files matching the exact same name, while allowing varying name files matching the same chunk to load in a single call. - Rewrite the pose stream / schema definitions such that the reader will load only files with specific names.
Seems like 1. might be more appropriate, need to investigate where exactly to place the key construction and evaluate any possible backwards incompatible behaviors. While 2. is a valid workaround today, it would dramatically complicate the schemas.