WenjieDu
diff --git a/‎.github/dependabot.yml‎
Lines changed: 15 additions & 0 deletions b/‎.github/dependabot.yml‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 35 additions & 31 deletions b/‎README.md‎
Lines changed: 35 additions & 31 deletions
diff --git a/‎README_zh.md‎
Lines changed: 29 additions & 26 deletions b/‎README_zh.md‎
Lines changed: 29 additions & 26 deletions
@@ -0,0 +1,15 @@
+version: 2
+updates:
+  # Python Dependencies
+  - package-ecosystem: "pip"
+    directory: "/"
+    schedule:
+      interval: "weekly"
+    open-pull-requests-limit: 5
+
+  # GitHub Actions
+  - package-ecosystem: "github-actions"
+    directory: "/"
+    schedule:
+      interval: "weekly"
+    open-pull-requests-limit: 5
@@ -35,7 +35,7 @@
         <img alt="Code Climate maintainability" src="https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyPOTS?color=3C7699&label=Maintainability&logo=codeclimate">
     </a>
     <a href="https://coveralls.io/github/WenjieDu/PyPOTS">
-        <img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=main&logo=coveralls&color=75C1C4&label=Coverage">
+        <img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=full_test&logo=coveralls&color=75C1C4&label=Coverage">
     </a>
     <a href="https://github.com/WenjieDu/PyPOTS/actions/workflows/testing_ci.yml">
         <img alt="GitHub Testing" src="https://img.shields.io/github/actions/workflow/status/wenjiedu/pypots/testing_ci.yml?logo=circleci&color=C8D8E1&label=CI">
@@ -100,7 +100,8 @@ currently supported. Stay tuned❗️).
 
 🌟 Since **v0.2**, all neural-network models in PyPOTS has got hyperparameter-optimization support.
 This functionality is implemented with the [Microsoft NNI](https://github.com/microsoft/nni) framework. You may want to
-refer to our time-series imputation survey and benchmark repo [Awesome_Imputation](https://github.com/WenjieDu/Awesome_Imputation)
+refer to our time-series imputation survey and benchmark
+repo [Awesome_Imputation](https://github.com/WenjieDu/Awesome_Imputation)
 to see how to config and tune the hyperparameters.
 
 🔥 Note that all models whose name with `🧑‍🔧` in the table (e.g. Transformer, iTransformer, Informer etc.) are not
@@ -121,6 +122,7 @@ The paper references and links are all listed at the bottom of this file.
 |:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:---------------------------------------------------|
 | LLM&TSFM      | <a href="https://time-series.ai"><img src="https://time-series.ai/static/figs/robot.svg" width="26px"> Time-Series.AI</a>  [^36] |    ✅     |    ✅     |    ✅     |    ✅     |    ✅     | <a href="https://time-series.ai">Join waitlist</a> |
 | LLM           | Time-LLM🧑‍🔧[^45]                                                                                                               |    ✅     |    ✅     |          |          |          | `2024 - ICLR`                                      |
+| TSFM          | MOMENT[^47]                                                                                                                      |    ✅     |    ✅     |          |          |          | `2024 - ICML`                                      |
 | Neural Net    | TEFN🧑‍🔧[^39]                                                                                                                   |    ✅     |    ✅     |          |          |          | `2024 - arXiv`                                     |
 | Neural Net    | FITS🧑‍🔧[^41]                                                                                                                   |    ✅     |    ✅     |          |          |          | `2024 - ICLR`                                      |
 | Neural Net    | TimeMixer[^37]                                                                                                                   |    ✅     |    ✅     |          |          |          | `2024 - ICLR`                                      |
@@ -170,8 +172,8 @@ The paper references and links are all listed at the bottom of this file.
 
 🙋 Differences between `LLM (Large Language Model)` and `TSFM (Time-Series Foundation Model)` in the above table:
 `LLM` refers to the models that are pre-trained on large-scale text data and can be fine-tuned for specific tasks.
-`TSFM` refers to the models that are pre-trained on large-scale time series data, inspired by recent achievements 
-of foundation models in CV and NLP. 
+`TSFM` refers to the models that are pre-trained on large-scale time series data, inspired by recent achievements
+of foundation models in CV and NLP.
 
 💯 Contribute your model right now to increase your research impact! PyPOTS downloads are increasing rapidly
 (**[600K+ in total and 1K+ daily on PyPI so far](https://www.pepy.tech/projects/pypots)**),
@@ -268,31 +270,30 @@ We present you a usage example of imputing missing values in time series with Py
 <summary><b>Click here to see an example applying SAITS on PhysioNet2012 for imputation:</b></summary>
 
 ``` python
-# Data preprocessing. Tedious, but PyPOTS can help.
 import numpy as np
 from sklearn.preprocessing import StandardScaler
-from pygrinder import mcar
-from pypots.data import load_specific_dataset
-data = load_specific_dataset('physionet_2012')  # PyPOTS will automatically download and extract it.
-X = data['X']
-num_samples = len(X['RecordID'].unique())
-X = X.drop(['RecordID', 'Time'], axis = 1)
-X = StandardScaler().fit_transform(X.to_numpy())
-X = X.reshape(num_samples, 48, -1)
-X_ori = X  # keep X_ori for validation
-X = mcar(X, 0.1)  # randomly hold out 10% observed values as ground truth
-dataset = {"X": X}  # X for model input
-print(X.shape)  # (11988, 48, 37), 11988 samples and each sample has 48 time steps, 37 features
-
-# Model training. This is PyPOTS showtime.
-from pypots.imputation import SAITS
+from pygrinder import mcar, calc_missing_rate
+from benchpots.datasets import preprocess_physionet2012
+data = preprocess_physionet2012(subset='set-a',rate=0.1) # Our ecosystem libs will automatically download and extract it
+train_X, val_X, test_X = data["train_X"], data["val_X"], data["test_X"]
+print(train_X.shape)  # (n_samples, n_steps, n_features)
+print(val_X.shape)  # samples (n_samples) in train set and val set are different, but they have the same sequence len (n_steps) and feature dim (n_features)
+print(f"We have {calc_missing_rate(train_X):.1%} values missing in train_X")  
+train_set = {"X": train_X}  # in training set, simply put the incomplete time series into it
+val_set = {
+    "X": val_X,
+    "X_ori": data["val_X_ori"],  # in validation set, we need ground truth for evaluation and picking the best model checkpoint
+}
+test_set = {"X": test_X}  # in test set, only give the testing incomplete time series for model to impute
+test_X_ori = data["test_X_ori"]  # test_X_ori bears ground truth for evaluation
+indicating_mask = np.isnan(test_X) ^ np.isnan(test_X_ori)  # mask indicates the values that are missing in X but not in X_ori, i.e. where the gt values are 
+
+from pypots.imputation import SAITS  # import the model you want to use
 from pypots.nn.functional import calc_mae
-saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=10)
-# Here I use the whole dataset as the training set because ground truth is not visible to the model, you can also split it into train/val/test sets
-saits.fit(dataset)  # train the model on the dataset
-imputation = saits.impute(dataset)  # impute the originally-missing values and artificially-missing values
-indicating_mask = np.isnan(X) ^ np.isnan(X_ori)  # indicating mask for imputation error calculation
-mae = calc_mae(imputation, np.nan_to_num(X_ori), indicating_mask)  # calculate mean absolute error on the ground truth (artificially-missing values)
+saits = SAITS(n_steps=train_X.shape[1], n_features=train_X.shape[2], n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=5)
+saits.fit(train_set, val_set)  # train the model on the dataset
+imputation = saits.impute(test_set)  # impute the originally-missing values and artificially-missing values
+mae = calc_mae(imputation, np.nan_to_num(test_X_ori), indicating_mask)  # calculate mean absolute error on the ground truth (artificially-missing values)
 saits.save("save_it_here/saits_physionet2012.pypots")  # save the model for future use
 saits.load("save_it_here/saits_physionet2012.pypots")  # reload the serialized model file for following imputation or training
 ```
@@ -519,18 +520,21 @@ Time-Series.AI</a>
 [^41]: Xu, Z., Zeng, A., & Xu, Q. (2024).
 [FITS: Modeling Time Series with 10k parameters](https://openreview.net/forum?id=bWcnvZ3qMb).
 *ICLR 2024*.
-[^42]: Qian, L., Ibrahim, Z., Ellis, H. L., Zhang, A., Zhang, Y., Wang, T., & Dobson, R. (2023). 
+[^42]: Qian, L., Ibrahim, Z., Ellis, H. L., Zhang, A., Zhang, Y., Wang, T., & Dobson, R. (2023).
 [Knowledge Enhanced Conditional Imputation for Healthcare Time-series](https://arxiv.org/abs/2312.16713).
 *arXiv 2023*.
-[^43]: Lin, S., Lin, W., Wu, W., Zhao, F., Mo, R., & Zhang, H. (2023). 
+[^43]: Lin, S., Lin, W., Wu, W., Zhao, F., Mo, R., & Zhang, H. (2023).
 [SegRNN: Segment Recurrent Neural Network for Long-Term Time Series Forecasting](https://arxiv.org/abs/2308.11200).
 *arXiv 2023*.
-[^44]: Yu, H. F., Rao, N., & Dhillon, I. S. (2016). 
+[^44]: Yu, H. F., Rao, N., & Dhillon, I. S. (2016).
 [Temporal regularized matrix factorization for high-dimensional time series prediction](https://papers.nips.cc/paper_files/paper/2016/hash/85422afb467e9456013a2a51d4dff702-Abstract.html).
 *NeurIPS 2016*.
-[^45]: Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., ... & Wen, Q. (2024). 
+[^45]: Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., ... & Wen, Q. (2024).
 [Time-LLM: Time Series Forecasting by Reprogramming Large Language Models](https://openreview.net/forum?id=Unb5CVPtae).
 *ICLR 2024*.
-[^46]: Zhou, T., Niu, P., Sun, L., & Jin, R. (2023). 
+[^46]: Zhou, T., Niu, P., Sun, L., & Jin, R. (2023).
 [One Fits All: Power General Time Series Analysis by Pretrained LM](https://openreview.net/forum?id=gMS6FVZvmF).
 *NeurIPS 2023*.
+[^47]: Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024).
+[MOMENT: A Family of Open Time-series Foundation Models](https://proceedings.mlr.press/v235/goswami24a.html).
+*ICML 2024*.
@@ -35,7 +35,7 @@
         <img alt="Code Climate maintainability" src="https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyPOTS?color=3C7699&label=Maintainability&logo=codeclimate">
     </a>
     <a href="https://coveralls.io/github/WenjieDu/PyPOTS">
-        <img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=main&logo=coveralls&color=75C1C4&label=Coverage">
+        <img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=full_test&logo=coveralls&color=75C1C4&label=Coverage">
     </a>
     <a href="https://github.com/WenjieDu/PyPOTS/actions/workflows/testing_ci.yml">
         <img alt="GitHub Testing" src="https://img.shields.io/github/actions/workflow/status/wenjiedu/pypots/testing_ci.yml?logo=circleci&color=C8D8E1&label=CI">
@@ -106,6 +106,7 @@ PyPOTS当前支持多变量POTS数据的插补, 预测, 分类, 聚类以及异
 |:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:------:|:------:|:------:|:------:|:--------:|:---------------------------------------------------|
 | LLM&TSFM      | <a href="https://time-series.ai"><img src="https://time-series.ai/static/figs/robot.svg" width="26px"> Time-Series.AI</a>  [^36] |    ✅     |    ✅     |    ✅     |    ✅     |    ✅     | <a href="https://time-series.ai">Join waitlist</a> |
 | LLM           | Time-LLM🧑‍🔧[^45]                                                                                                               |    ✅     |    ✅     |          |          |          | `2024 - ICLR`                                      |
+| TSFM          | MOMENT[^47]                                                                                                                      |    ✅     |    ✅     |          |          |          | `2024 - ICML`                                      |
 | Neural Net    | TEFN🧑‍🔧[^39]                                                                                                                   |    ✅     |    ✅     |          |          |          | `2024 - arXiv`                                     |
 | Neural Net    | FITS🧑‍🔧[^41]                                                                                                                   |    ✅     |    ✅     |          |          |          | `2024 - ICLR`                                      |
 | Neural Net    | TimeMixer[^37]                                                                                                                   |    ✅     |    ✅     |          |          |          | `2024 - ICLR`                                      |
@@ -249,33 +250,32 @@ conda update  conda-forge::pypots  # 更新为最新版本
 <summary><b>点击此处查看 SAITS 模型应用于 PhysioNet2012 数据集插补任务的简单案例:</b></summary>
 
 ``` python
-# 数据预处理, 使用PyPOTS生态帮助完成繁琐的数据预处理
 import numpy as np
 from sklearn.preprocessing import StandardScaler
-from pygrinder import mcar
-from pypots.data import load_specific_dataset
-data = load_specific_dataset('physionet_2012')  # PyPOTS将自动下载并加载和处理数据
-X = data['X']
-num_samples = len(X['RecordID'].unique())
-X = X.drop(['RecordID', 'Time'], axis = 1)
-X = StandardScaler().fit_transform(X.to_numpy())
-X = X.reshape(num_samples, 48, -1)
-X_ori = X  # keep X_ori for validation
-X = mcar(X, 0.1)  # 随机掩盖观测值的10%, 作为基准数据
-dataset = {"X": X}  # X用于模型输入
-print(X.shape)  # X的形状为(11988, 48, 37), 即11988个样本, 每个样本有48个步长(time steps)和37个特征(features)
-
-# 模型训练. PyPOTS的好戏上演了！
-from pypots.imputation import SAITS
+from pygrinder import mcar, calc_missing_rate
+from benchpots.datasets import preprocess_physionet2012
+data = preprocess_physionet2012(subset='set-a', rate=0.1)  # 我们的工具库会自动下载并解压数据集
+train_X, val_X, test_X = data["train_X"], data["val_X"], data["test_X"]
+print(train_X.shape)  # (n_samples, n_steps, n_features)
+print(val_X.shape)  # 验证集的样本数与训练集不同（n_samples不同），但样本长度（n_steps）和特征维度（n_features）一致
+print(f"训练集 train_X 中缺失值的比例为 {calc_missing_rate(train_X):.1%}")
+train_set = {"X": train_X}  # 训练集只需包含不完整时间序列
+val_set = {
+    "X": val_X,
+    "X_ori": data["val_X_ori"],  # 验证集中我们需要真实值用于评估和选择模型
+}
+test_set = {"X": test_X}  # 测试集仅提供待填补的不完整时间序列
+test_X_ori = data["test_X_ori"]  # test_X_ori 包含用于最终评估的真实值
+indicating_mask = np.isnan(test_X) ^ np.isnan(test_X_ori)  # 生成指示掩码：标记出测试集中人为添加的缺失位置（X中存在缺失但X_ori中不缺失的位置）
+
+from pypots.imputation import SAITS  # 导入你想要使用的模型
 from pypots.nn.functional import calc_mae
-saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=10)
-# 因为基准数据对模型不可知, 将整个数据集作为训练集, 也可以把数据集分为训练/验证/测试集
-saits.fit(dataset)  # 基于数据集训练模型
-imputation = saits.impute(dataset)  # 插补数据集中原始缺失部分和我们上面人为遮蔽缺失的基准数据部分
-indicating_mask = np.isnan(X) ^ np.isnan(X_ori)  # 用于计算插补误差的掩码矩阵
-mae = calc_mae(imputation, np.nan_to_num(X_ori), indicating_mask)  # 计算人为遮掩部分数据的平均绝对误差MAE
-saits.save("save_it_here/saits_physionet2012.pypots")  # 保存模型
-saits.load("save_it_here/saits_physionet2012.pypots")  # 你随时可以重新加载保存的模型文件以进行后续的插补或训练
+saits = SAITS(n_steps=train_X.shape[1], n_features=train_X.shape[2], n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=5)
+saits.fit(train_set, val_set)  # 在数据集上训练模型
+imputation = saits.impute(test_set)  # 对测试集中原始缺失和人为缺失的值进行填补
+mae = calc_mae(imputation, np.nan_to_num(test_X_ori), indicating_mask)  # 在人为添加的缺失位置上计算 MAE（对比填补结果与真实值）
+saits.save("save_it_here/saits_physionet2012.pypots")  # 保存模型供后续使用
+saits.load("save_it_here/saits_physionet2012.pypots")  # 重新加载模型用于后续填补或继续训练
 ```
 
 </details>
@@ -505,4 +505,7 @@ Time-Series.AI</a>
 *ICLR 2024*.
 [^46]: Zhou, T., Niu, P., Sun, L., & Jin, R. (2023). 
 [One Fits All: Power General Time Series Analysis by Pretrained LM](https://openreview.net/forum?id=gMS6FVZvmF).
-*NeurIPS 2023*.
+*NeurIPS 2023*.
+[^47]: Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024).
+[MOMENT: A Family of Open Time-series Foundation Models](https://proceedings.mlr.press/v235/goswami24a.html).
+*ICML 2024*.