-
Notifications
You must be signed in to change notification settings - Fork 683
[ENH] Design Questions for ptf-v2
#1831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What to do with splitting?
Because of this, I wonder whether the default in v2 should not be 70/15/15, but instead 100/100/100, i.e., all the data is passed to all the splits. This will cause leakage if used naively, but it means the default setting encourages users to do their own, manual splits - and it would behave like current v1 (please correct me if I am wrong). To handle splitting more automatically, I would later add a "splitter" field to the decoder/encoder loader, which can receive an sklearn or sktime-like splitter. However, since this is not a v1 feature, and we should be aiming for feature parity first, I would postpone this until we can 1:1 replace v1 functionality with v2 functionality. Preprocessing featuresI would leave these as placeholders and go for a working end-to-end prototype first. The next priority would be feature parity, and that includes scalers etc. |
Yes so we should make the splitting optional rather than default |
Since this issue on design questions for While working on the D2 for My suggestion is to try out two things, separately:
class BaseTimeSeriesDataModule(LightningDataModule):
"""Abstract base class for time series data modules."""
def _create_windows(self, indices: torch.Tensor) -> List[Tuple[int, int, int, int]]:
"""Base implementation for common windowing logic or can be implementation can be enforced in child class."""
pass # this implementation can be agreed upon once the d2 design for tslib is done.
def _normalize_features(self, features: torch.Tensor):
"""Normalize cont. features using configured scalers. Can be used inside child classes under _process_data"""
pass
...
def _process_data(self, idx: torch.Tensor) -> List[Dict[str, any]]:
"""Process the time series at idx before feeding into dataset""".
pass
# implement abstract methods.
@abstractmethod
def get_series(self, idx):
"""Get the time series at a particular step.
pass
...
I am inclined with the 1st approach, and suggestions are welcome. FYI @fkiraly @phoeenniixx @agobbifbk. |
Great Idea! |
mmh not sure if it is actually needed. The encoder-decoder DataModule should be sufficient to cover a lot of situations, even those related to foundational model. But if you are feeling that another layer of abstraction is needed it can be a good solution (as you remember we leave this point opaque because we were not able to decide it ad design level). Maybe I'm missing your point, we shall discuss it in the following days! |
Yes, this also works. |
I meant a base class from which @PranavBhatP's and my implementation can inherit, or is there any way that only the Just one question: In a broader view, can there be a case that current datamodule might not be enough - some models that cannot be broken down to encoder-decoder type architecture? In that case we might need a new datamodule and then this design might be useful |
It can be the case, but the 'windowing' procedure can be different for different data modules. So we can have, in a folder called for example |
My 2 cents on this is:
From a software engineering standpoint, I consider it important to get to a point quickly where we have at least two separate D2 classes to validate this assumtion - I think From there, we can then check how similar or how different the concrete software implementations are. If it is very difficult to unify them completely, it validates our initial thoughts about need for multiple D2. If they are very easy to unify, we can just pull everything together in a single class. As next steps, concretely, I would suggest to not try this unification first, as it will likely lead to a frustrated/stuck PR ("doing too many things at once"). Instead, I would suggest:
This is derived purely from experience on the software side - trying to partition work into "manageable difficulty" chunks. |
@phoeenniixx, yes, I believe we have seen that previously. Since this question seems to reappear repeatedly, I think we need to validate - or falsify - in the usual way when the question is, "do we need an abstraction": concretely list three important, and substantially different examples that cannot be covered by less abstraction. The suggested work items above should get us there - and we should also be able to name at least three estimators in advance, btw. |
Agree, at some point some architectures will require a dedicated d2 layer (LagLlama is a good example). Still I'm thinking if a dedicated d2 layer for the encoder only model is really needed or the encoder-decoder it is sufficient to cover also the encoder models (like in I will try to use an example for explaining one detail that can affect the aforementioned decision. Let suppose we have created a d2 layer that is only encoder (e.g. that look only at past data) that can be used, for example, by DLINEAR. The architecture can be easily modified for using also the Consideration of option 1 Consideration of option 2 Probably the topic Given that, as a matter of exercise and for cheeking the generality of the d2 layers we can still think to use a pure encoder d2 layer for serving the DLINEAR model. This can help us to understand the commonalities among those d2 layers. I hope I'm not causing any confusion. If I am, feel free to ignore this comment 🙂 |
We should address some design questions before merging and releasing the experimental
ptf-v2
implementation.What to do with splitting?
The
v1
implementation asks the user to do the splitting manually, but in our implementation there is a split made based ongroup
, eg, if there are 8 groups, 70% goes totrain
, 15% toval
. But there is no temporal split.So here what should we do?
group
optional? It will happen only if user wants so, otherwise the whole data can betrain
,val
ortest
, and there the user can do the temporal split as he wants manually like inv1
.To be clear: This splitting feature can be added later to the
v2
with all possible ways like temporal split and split based ongroup
but this question is for this experimental release only, to maintain homogeneity (as much as we could do) between the two versionsPreprocessing features
Right now the
params
for preprocessing in theEncoderDecoderTimeSeriesDataModule
liketarget_normalizer
,scalers
,categorical_encoders
etc are just placeholders and are not actually used.As these are just features, so should we add them right now? or we can wait it out once we have a concrete base implementation?
The text was updated successfully, but these errors were encountered: