Skip to content

[ENH] Design Questions for ptf-v2 #1831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
phoeenniixx opened this issue May 12, 2025 · 11 comments
Open

[ENH] Design Questions for ptf-v2 #1831

phoeenniixx opened this issue May 12, 2025 · 11 comments
Labels
API design API design & software architecture

Comments

@phoeenniixx
Copy link
Contributor

We should address some design questions before merging and releasing the experimental ptf-v2 implementation.

  • What to do with splitting?

The v1 implementation asks the user to do the splitting manually, but in our implementation there is a split made based on group, eg, if there are 8 groups, 70% goes to train, 15% to val. But there is no temporal split.
So here what should we do?

  • Make the split based on group optional? It will happen only if user wants so, otherwise the whole data can be train, val or test, and there the user can do the temporal split as he wants manually like in v1.
  • Remove the splitting completely?

To be clear: This splitting feature can be added later to the v2 with all possible ways like temporal split and split based on group but this question is for this experimental release only, to maintain homogeneity (as much as we could do) between the two versions

  • Preprocessing features

Right now the params for preprocessing in the EncoderDecoderTimeSeriesDataModule like target_normalizer, scalers, categorical_encoders etc are just placeholders and are not actually used.
As these are just features, so should we add them right now? or we can wait it out once we have a concrete base implementation?

@fkiraly
Copy link
Collaborator

fkiraly commented May 13, 2025

What to do with splitting?

The v1 implementation asks the user to do the splitting manually

Because of this, I wonder whether the default in v2 should not be 70/15/15, but instead 100/100/100, i.e., all the data is passed to all the splits.

This will cause leakage if used naively, but it means the default setting encourages users to do their own, manual splits - and it would behave like current v1 (please correct me if I am wrong).

To handle splitting more automatically, I would later add a "splitter" field to the decoder/encoder loader, which can receive an sklearn or sktime-like splitter.

However, since this is not a v1 feature, and we should be aiming for feature parity first, I would postpone this until we can 1:1 replace v1 functionality with v2 functionality.

Preprocessing features

I would leave these as placeholders and go for a working end-to-end prototype first.

The next priority would be feature parity, and that includes scalers etc.

@fkiraly fkiraly added the API design API design & software architecture label May 13, 2025
@phoeenniixx
Copy link
Contributor Author

This will cause leakage if used naively, but it means the default setting encourages users to do their own, manual splits - and it would behave like current v1 (please correct me if I am wrong).

Yes so we should make the splitting optional rather than default

@PranavBhatP
Copy link
Contributor

Since this issue on design questions for ptf-v2 already exists, I will add my opinions on some changes for the D2 layer here (as mentioned in the standup, dated 19-05), after reviewing @phoeenniixx 's work. This is more of an addition rather than an updation.

While working on the D2 for tslib I noticed that when it came to things like the _create_windows function, there seems to duplication of code, since the logic is almost the same (it was also the case in v1 since a common windowing logic was used in the entire library). The same applies to things like scaling and normalization whenever they are applied as code in the future.

My suggestion is to try out two things, separately:

  1. BaseTimeSeriesDataModule
class BaseTimeSeriesDataModule(LightningDataModule):
    """Abstract base class for time series data modules."""
    
   def _create_windows(self, indices: torch.Tensor) -> List[Tuple[int, int, int, int]]:
       """Base implementation for common windowing logic or can be implementation can be enforced in child class."""
       pass # this implementation can be agreed upon once the d2 design for tslib is done.
   
    def _normalize_features(self,  features: torch.Tensor):
        """Normalize cont. features using configured scalers. Can be used inside child classes under _process_data"""
        pass

   ...

    def _process_data(self, idx: torch.Tensor) -> List[Dict[str, any]]:
        """Process the time series at idx before feeding into dataset""".
        pass

    # implement abstract methods.
    @abstractmethod
    def get_series(self, idx):
        """Get the time series at a particular step.
        pass
    ...
  1. Use the functions as utility classes
    These functions i.e preprocessing can be kept as utility functions in a common module. This makes it extensible.

I am inclined with the 1st approach, and suggestions are welcome. FYI @fkiraly @phoeenniixx @agobbifbk.

@phoeenniixx
Copy link
Contributor Author

Great Idea!
But what if we combine both approaches? That makes it modular and also provides a base on which different data_modules can stand?
what I mean is move the functions to utilities and then call their reference in the base class?
Can this work @PranavBhatP?

@agobbifbk
Copy link

mmh not sure if it is actually needed. The encoder-decoder DataModule should be sufficient to cover a lot of situations, even those related to foundational model. But if you are feeling that another layer of abstraction is needed it can be a good solution (as you remember we leave this point opaque because we were not able to decide it ad design level). Maybe I'm missing your point, we shall discuss it in the following days!

@PranavBhatP
Copy link
Contributor

But what if we combine both approaches?

Yes, this also works.

@phoeenniixx
Copy link
Contributor Author

phoeenniixx commented May 19, 2025

mmh not sure if it is actually needed. The encoder-decoder DataModule should be sufficient to cover a lot of situations, even those related to foundational model. But if you are feeling that another layer of abstraction is needed it can be a good solution (as you remember we leave this point opaque because we were not able to decide it ad design level). Maybe I'm missing your point, we shall discuss it in the following days!

I meant a base class from which @PranavBhatP's and my implementation can inherit, or is there any way that only the EncoderDecoderDataModule alone can work (like you were saying some tslib models can also be seen as encoder-deocder type)?
In that case we just need to move the functions out in a separate module just to make the code clean and modular (see #1829), other than that only one data_module would be able to handle all the models.

Just one question: In a broader view, can there be a case that current datamodule might not be enough - some models that cannot be broken down to encoder-decoder type architecture? In that case we might need a new datamodule and then this design might be useful

@agobbifbk
Copy link

It can be the case, but the 'windowing' procedure can be different for different data modules. So we can have, in a folder called for example utils or commons multiple windowing functions. My worry is about having disconnected code between commons and datamodules... But this is a SE topic so I don't have a strong opinion on that, maybe @fkiraly can give a strong opinion on this. What I think is that the encoder-decoder datamodule can serve also models without the encoder and models that are pure autoregressive...

@fkiraly
Copy link
Collaborator

fkiraly commented May 20, 2025

My 2 cents on this is:

  • we did an early review of the multiple packages and concluded that there will be a variation in D2 layers. Not necessarily because of the methodology, but because of how the algorithms are implemented, how they are parametrized, etc. Plus, there are foundation models without decoder/encoder structure.
  • in particular, if all could have been dealt with with a single D2 object, we would not have needed the D2 layer.

From a software engineering standpoint, I consider it important to get to a point quickly where we have at least two separate D2 classes to validate this assumtion - I think tslib is a good start (@PranavBhatP), and dsipts would also be good as a third example.

From there, we can then check how similar or how different the concrete software implementations are. If it is very difficult to unify them completely, it validates our initial thoughts about need for multiple D2. If they are very easy to unify, we can just pull everything together in a single class.

As next steps, concretely, I would suggest to not try this unification first, as it will likely lead to a frustrated/stuck PR ("doing too many things at once"). Instead, I would suggest:

  • @PranavBhatP builds D2 for tslib next
  • Sandeep gets dsipts in a feature minimal v2 state
  • we ensure joint tests work at least for tslib and pytorch-forecasting native models, in pytorch-forecasting
  • based on that, we can start refactoring and simplifying

This is derived purely from experience on the software side - trying to partition work into "manageable difficulty" chunks.

@fkiraly
Copy link
Collaborator

fkiraly commented May 20, 2025

Just one question: In a broader view, can there be a case that current datamodule might not be enough - some models that cannot be broken down to encoder-decoder type architecture? In that case we might need a new datamodule and then this design might be useful

@phoeenniixx, yes, I believe we have seen that previously.

Since this question seems to reappear repeatedly, I think we need to validate - or falsify - in the usual way when the question is, "do we need an abstraction": concretely list three important, and substantially different examples that cannot be covered by less abstraction.

The suggested work items above should get us there - and we should also be able to name at least three estimators in advance, btw.
(I suppose, LSTM, NBEATS, and LagLlama?)

@agobbifbk
Copy link

Agree, at some point some architectures will require a dedicated d2 layer (LagLlama is a good example). Still I'm thinking if a dedicated d2 layer for the encoder only model is really needed or the encoder-decoder it is sufficient to cover also the encoder models (like in tslib).

I will try to use an example for explaining one detail that can affect the aforementioned decision.

Let suppose we have created a d2 layer that is only encoder (e.g. that look only at past data) that can be used, for example, by DLINEAR.

The architecture can be easily modified for using also the decoder part of the time series (future known covariates).
We can do it in two ways: the first is to copy-paste the architecture, insert the decoder part and attach to this new version (called DLINEAR_decoder?) the d2 encoder-decoder layer. The second option can be to directly modify the DLINEAR code, inserting the decoder part and putting a flag use_original if I really want to use the encoder part only (because I want to show the same architecture with the decoder performs better respect to the original one).

Consideration of option 1
PRO: clear code
CONS: lot of duplication, harder to maintain, what the final user should do?

Consideration of option 2
PRO: we are serving two version of the DLINEAR with just one model and one d2 layer
CONS: some if else statements inside the architecture definition and forward loop

Probably the topic modify existing architectures is not a feature we need to develop now, but I'm putting on the table another element that can be useful.

Given that, as a matter of exercise and for cheeking the generality of the d2 layers we can still think to use a pure encoder d2 layer for serving the DLINEAR model. This can help us to understand the commonalities among those d2 layers.
The example encoder-decoder vs encoder is probably not a good example because one is a subset of the other, this is why I have some concerns.

I hope I'm not causing any confusion. If I am, feel free to ignore this comment 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture
Projects
None yet
Development

No branches or pull requests

4 participants