Skip to content

[API] M layer design #1870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fkiraly opened this issue Jun 2, 2025 · 4 comments
Open

[API] M layer design #1870

fkiraly opened this issue Jun 2, 2025 · 4 comments
Labels
API design API design & software architecture enhancement New feature or request

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 2, 2025

Issue to discuss the M layer design. For context, see design document here: sktime/enhancement-proposals#39

My current proposed design, this is base on v1 and v2 metadata layer design. Long-term state:

  • starts off the current "metadata" class, but named like the model, e.g., TFT. The current ligthning network is renamed TFT_NN
  • has tags and get_test_params etc similar to current "metadata" class
  • __init__ has all args of two objects: the loader (D2, e.g., DecoderEncoderModule) and the network (e.g., TFT_NN). minus data
  • method get_loader_class gets the loader class (e.g., the class DecoderEncoderModule); get_loader(data: TimeSeries) produces an loader object, an instance of the get_loader_class return.
  • method get_nn_class returns the nn class (e.g., TFT_NN); get_nn(loader) gets an instance of the nn class.
  • finally, there is a method init(data), which calls the above in sequence, and produces a pair of loader and nn, as if the two get methods were called in sequence.
  • __call__ dispatches to init

So, a usage vignette could look like:

from lightning.pytorch import Trainer
from pytorch_forecasting import TimeSeries
from pytorch_forecasting.models import TFT

dataset = TimeSeries(...)
model_cfg = TFT(
    max_encoder_length=30,
    max_prediction_length=1,
    batch_size=32,
    loss=nn.MSELoss(),
    logging_metrics=[MAE(), SMAPE()],
    optimizer="adam",
    hidden_size=64,
    num_layers=2,
    attention_head_size=4,
)

net, loader = model_cfg(dataset)

trainer = Trainer(
    max_epochs=5,
    accelerator="auto",
    devices=1,
    enable_progress_bar=True,
    log_every_n_steps=10,
)

trainer.fit(net, loader)

etc

The only thing that changes for other models are the model class, and the args/values of it, for model_cfg.

In sktime, we would add the trainer as an arg to __init__, and sktime fit(data) does self.trainer(*self.model_cfg(data)) (with some potential conversion for data - or we could allow TimeSeries as an mtype)

@fkiraly fkiraly added enhancement New feature or request API design API design & software architecture labels Jun 2, 2025
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 2, 2025

@agobbifbk
Copy link

Nice!!!
Some comments:

  • logging_metrics=[MAE(), SMAPE()], in my opinion should be managed by the Trainer with the argument callbacks=...
  • with this design, are we loosing the flow 'D1->D2->M'? It seems that 'M' and 'D2' are create in the same function, can this cause some problem to the user? We are mixing model parameters, loss and optimizer in the same place
  • Maybe we can think to a intermediate solution where, instead of having one metadata class per model we have a class for a class of models. For example Encoder-Decoder models all needs some information: past and future steps, number of channels, embedding dimension (categorical variables), output size and so one, that are collected from the metadata of the d2 layer.
  • Your approach can be adapted but what I'm saying is that TFT can be something more general and reused by different models, where we clearly see which are the model parameters and what are the parameters derived from the D2 layer

Maybe I'm not so clear, we can discuss it further in a dedicated meeting, my feeling is that we can optimize even more the structure keeping clear the flow D1->D2->M.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 3, 2025

with this design, are we loosing the flow 'D1->D2->M'?

Well, my idea always was that the user sided flow optimally should only be "data -> model", and that interface being unified, same as in sklearn. That is, the flow is D1 -> D2 -> NN, and M = D2 + NN

The challenge was that we had to cut the data layer in half to achieve unified.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 3, 2025

Maybe we can think to a intermediate solution where, instead of having one metadata class per model we have a class for a class of models.

Agreed, though I would make this an additional intermediate base class. Users typically want "model" (including loader) as an object.

Your approach can be adapted but what I'm saying is that TFT can be something more general and reused by different models, where we clearly see which are the model parameters and what are the parameters derived from the D2 layer

Can you give examples of "reuse"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants