Skip to content

Commit 07a35fc

Browse files
authored
Merge pull request #634 from WenjieDu/dev
Release v0.12
2 parents a4d5e0f + ce9cef6 commit 07a35fc

File tree

188 files changed

+4248
-3202
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

188 files changed

+4248
-3202
lines changed

.github/dependabot.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
version: 2
2+
updates:
3+
# Python Dependencies
4+
- package-ecosystem: "pip"
5+
directory: "/"
6+
schedule:
7+
interval: "weekly"
8+
open-pull-requests-limit: 5
9+
10+
# GitHub Actions
11+
- package-ecosystem: "github-actions"
12+
directory: "/"
13+
schedule:
14+
interval: "weekly"
15+
open-pull-requests-limit: 5

README.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
<img alt="Code Climate maintainability" src="https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyPOTS?color=3C7699&label=Maintainability&logo=codeclimate">
3636
</a>
3737
<a href="https://coveralls.io/github/WenjieDu/PyPOTS">
38-
<img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=main&logo=coveralls&color=75C1C4&label=Coverage">
38+
<img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=full_test&logo=coveralls&color=75C1C4&label=Coverage">
3939
</a>
4040
<a href="https://github.com/WenjieDu/PyPOTS/actions/workflows/testing_ci.yml">
4141
<img alt="GitHub Testing" src="https://img.shields.io/github/actions/workflow/status/wenjiedu/pypots/testing_ci.yml?logo=circleci&color=C8D8E1&label=CI">
@@ -100,7 +100,8 @@ currently supported. Stay tuned❗️).
100100

101101
🌟 Since **v0.2**, all neural-network models in PyPOTS has got hyperparameter-optimization support.
102102
This functionality is implemented with the [Microsoft NNI](https://github.com/microsoft/nni) framework. You may want to
103-
refer to our time-series imputation survey and benchmark repo [Awesome_Imputation](https://github.com/WenjieDu/Awesome_Imputation)
103+
refer to our time-series imputation survey and benchmark
104+
repo [Awesome_Imputation](https://github.com/WenjieDu/Awesome_Imputation)
104105
to see how to config and tune the hyperparameters.
105106

106107
🔥 Note that all models whose name with `🧑‍🔧` in the table (e.g. Transformer, iTransformer, Informer etc.) are not
@@ -121,6 +122,7 @@ The paper references and links are all listed at the bottom of this file.
121122
|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:---------------------------------------------------|
122123
| LLM&TSFM | <a href="https://time-series.ai"><img src="https://time-series.ai/static/figs/robot.svg" width="26px"> Time-Series.AI</a> [^36] |||||| <a href="https://time-series.ai">Join waitlist</a> |
123124
| LLM | Time-LLM🧑‍🔧[^45] ||| | | | `2024 - ICLR` |
125+
| TSFM | MOMENT[^47] ||| | | | `2024 - ICML` |
124126
| Neural Net | TEFN🧑‍🔧[^39] ||| | | | `2024 - arXiv` |
125127
| Neural Net | FITS🧑‍🔧[^41] ||| | | | `2024 - ICLR` |
126128
| Neural Net | TimeMixer[^37] ||| | | | `2024 - ICLR` |
@@ -170,8 +172,8 @@ The paper references and links are all listed at the bottom of this file.
170172

171173
🙋 Differences between `LLM (Large Language Model)` and `TSFM (Time-Series Foundation Model)` in the above table:
172174
`LLM` refers to the models that are pre-trained on large-scale text data and can be fine-tuned for specific tasks.
173-
`TSFM` refers to the models that are pre-trained on large-scale time series data, inspired by recent achievements
174-
of foundation models in CV and NLP.
175+
`TSFM` refers to the models that are pre-trained on large-scale time series data, inspired by recent achievements
176+
of foundation models in CV and NLP.
175177

176178
💯 Contribute your model right now to increase your research impact! PyPOTS downloads are increasing rapidly
177179
(**[600K+ in total and 1K+ daily on PyPI so far](https://www.pepy.tech/projects/pypots)**),
@@ -268,31 +270,30 @@ We present you a usage example of imputing missing values in time series with Py
268270
<summary><b>Click here to see an example applying SAITS on PhysioNet2012 for imputation:</b></summary>
269271

270272
``` python
271-
# Data preprocessing. Tedious, but PyPOTS can help.
272273
import numpy as np
273274
from sklearn.preprocessing import StandardScaler
274-
from pygrinder import mcar
275-
from pypots.data import load_specific_dataset
276-
data = load_specific_dataset('physionet_2012') # PyPOTS will automatically download and extract it.
277-
X = data['X']
278-
num_samples = len(X['RecordID'].unique())
279-
X = X.drop(['RecordID', 'Time'], axis = 1)
280-
X = StandardScaler().fit_transform(X.to_numpy())
281-
X = X.reshape(num_samples, 48, -1)
282-
X_ori = X # keep X_ori for validation
283-
X = mcar(X, 0.1) # randomly hold out 10% observed values as ground truth
284-
dataset = {"X": X} # X for model input
285-
print(X.shape) # (11988, 48, 37), 11988 samples and each sample has 48 time steps, 37 features
286-
287-
# Model training. This is PyPOTS showtime.
288-
from pypots.imputation import SAITS
275+
from pygrinder import mcar, calc_missing_rate
276+
from benchpots.datasets import preprocess_physionet2012
277+
data = preprocess_physionet2012(subset='set-a',rate=0.1) # Our ecosystem libs will automatically download and extract it
278+
train_X, val_X, test_X = data["train_X"], data["val_X"], data["test_X"]
279+
print(train_X.shape) # (n_samples, n_steps, n_features)
280+
print(val_X.shape) # samples (n_samples) in train set and val set are different, but they have the same sequence len (n_steps) and feature dim (n_features)
281+
print(f"We have {calc_missing_rate(train_X):.1%} values missing in train_X")
282+
train_set = {"X": train_X} # in training set, simply put the incomplete time series into it
283+
val_set = {
284+
"X": val_X,
285+
"X_ori": data["val_X_ori"], # in validation set, we need ground truth for evaluation and picking the best model checkpoint
286+
}
287+
test_set = {"X": test_X} # in test set, only give the testing incomplete time series for model to impute
288+
test_X_ori = data["test_X_ori"] # test_X_ori bears ground truth for evaluation
289+
indicating_mask = np.isnan(test_X) ^ np.isnan(test_X_ori) # mask indicates the values that are missing in X but not in X_ori, i.e. where the gt values are
290+
291+
from pypots.imputation import SAITS # import the model you want to use
289292
from pypots.nn.functional import calc_mae
290-
saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=10)
291-
# Here I use the whole dataset as the training set because ground truth is not visible to the model, you can also split it into train/val/test sets
292-
saits.fit(dataset) # train the model on the dataset
293-
imputation = saits.impute(dataset) # impute the originally-missing values and artificially-missing values
294-
indicating_mask = np.isnan(X) ^ np.isnan(X_ori) # indicating mask for imputation error calculation
295-
mae = calc_mae(imputation, np.nan_to_num(X_ori), indicating_mask) # calculate mean absolute error on the ground truth (artificially-missing values)
293+
saits = SAITS(n_steps=train_X.shape[1], n_features=train_X.shape[2], n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=5)
294+
saits.fit(train_set, val_set) # train the model on the dataset
295+
imputation = saits.impute(test_set) # impute the originally-missing values and artificially-missing values
296+
mae = calc_mae(imputation, np.nan_to_num(test_X_ori), indicating_mask) # calculate mean absolute error on the ground truth (artificially-missing values)
296297
saits.save("save_it_here/saits_physionet2012.pypots") # save the model for future use
297298
saits.load("save_it_here/saits_physionet2012.pypots") # reload the serialized model file for following imputation or training
298299
```
@@ -519,18 +520,21 @@ Time-Series.AI</a>
519520
[^41]: Xu, Z., Zeng, A., & Xu, Q. (2024).
520521
[FITS: Modeling Time Series with 10k parameters](https://openreview.net/forum?id=bWcnvZ3qMb).
521522
*ICLR 2024*.
522-
[^42]: Qian, L., Ibrahim, Z., Ellis, H. L., Zhang, A., Zhang, Y., Wang, T., & Dobson, R. (2023).
523+
[^42]: Qian, L., Ibrahim, Z., Ellis, H. L., Zhang, A., Zhang, Y., Wang, T., & Dobson, R. (2023).
523524
[Knowledge Enhanced Conditional Imputation for Healthcare Time-series](https://arxiv.org/abs/2312.16713).
524525
*arXiv 2023*.
525-
[^43]: Lin, S., Lin, W., Wu, W., Zhao, F., Mo, R., & Zhang, H. (2023).
526+
[^43]: Lin, S., Lin, W., Wu, W., Zhao, F., Mo, R., & Zhang, H. (2023).
526527
[SegRNN: Segment Recurrent Neural Network for Long-Term Time Series Forecasting](https://arxiv.org/abs/2308.11200).
527528
*arXiv 2023*.
528-
[^44]: Yu, H. F., Rao, N., & Dhillon, I. S. (2016).
529+
[^44]: Yu, H. F., Rao, N., & Dhillon, I. S. (2016).
529530
[Temporal regularized matrix factorization for high-dimensional time series prediction](https://papers.nips.cc/paper_files/paper/2016/hash/85422afb467e9456013a2a51d4dff702-Abstract.html).
530531
*NeurIPS 2016*.
531-
[^45]: Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., ... & Wen, Q. (2024).
532+
[^45]: Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., ... & Wen, Q. (2024).
532533
[Time-LLM: Time Series Forecasting by Reprogramming Large Language Models](https://openreview.net/forum?id=Unb5CVPtae).
533534
*ICLR 2024*.
534-
[^46]: Zhou, T., Niu, P., Sun, L., & Jin, R. (2023).
535+
[^46]: Zhou, T., Niu, P., Sun, L., & Jin, R. (2023).
535536
[One Fits All: Power General Time Series Analysis by Pretrained LM](https://openreview.net/forum?id=gMS6FVZvmF).
536537
*NeurIPS 2023*.
538+
[^47]: Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024).
539+
[MOMENT: A Family of Open Time-series Foundation Models](https://proceedings.mlr.press/v235/goswami24a.html).
540+
*ICML 2024*.

README_zh.md

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
<img alt="Code Climate maintainability" src="https://img.shields.io/codeclimate/maintainability-percentage/WenjieDu/PyPOTS?color=3C7699&label=Maintainability&logo=codeclimate">
3636
</a>
3737
<a href="https://coveralls.io/github/WenjieDu/PyPOTS">
38-
<img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=main&logo=coveralls&color=75C1C4&label=Coverage">
38+
<img alt="Coveralls coverage" src="https://img.shields.io/coverallsCoverage/github/WenjieDu/PyPOTS?branch=full_test&logo=coveralls&color=75C1C4&label=Coverage">
3939
</a>
4040
<a href="https://github.com/WenjieDu/PyPOTS/actions/workflows/testing_ci.yml">
4141
<img alt="GitHub Testing" src="https://img.shields.io/github/actions/workflow/status/wenjiedu/pypots/testing_ci.yml?logo=circleci&color=C8D8E1&label=CI">
@@ -106,6 +106,7 @@ PyPOTS当前支持多变量POTS数据的插补, 预测, 分类, 聚类以及异
106106
|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:------:|:------:|:------:|:------:|:--------:|:---------------------------------------------------|
107107
| LLM&TSFM | <a href="https://time-series.ai"><img src="https://time-series.ai/static/figs/robot.svg" width="26px"> Time-Series.AI</a> [^36] |||||| <a href="https://time-series.ai">Join waitlist</a> |
108108
| LLM | Time-LLM🧑‍🔧[^45] ||| | | | `2024 - ICLR` |
109+
| TSFM | MOMENT[^47] ||| | | | `2024 - ICML` |
109110
| Neural Net | TEFN🧑‍🔧[^39] ||| | | | `2024 - arXiv` |
110111
| Neural Net | FITS🧑‍🔧[^41] ||| | | | `2024 - ICLR` |
111112
| Neural Net | TimeMixer[^37] ||| | | | `2024 - ICLR` |
@@ -249,33 +250,32 @@ conda update conda-forge::pypots # 更新为最新版本
249250
<summary><b>点击此处查看 SAITS 模型应用于 PhysioNet2012 数据集插补任务的简单案例:</b></summary>
250251

251252
``` python
252-
# 数据预处理, 使用PyPOTS生态帮助完成繁琐的数据预处理
253253
import numpy as np
254254
from sklearn.preprocessing import StandardScaler
255-
from pygrinder import mcar
256-
from pypots.data import load_specific_dataset
257-
data = load_specific_dataset('physionet_2012') # PyPOTS将自动下载并加载和处理数据
258-
X = data['X']
259-
num_samples = len(X['RecordID'].unique())
260-
X = X.drop(['RecordID', 'Time'], axis = 1)
261-
X = StandardScaler().fit_transform(X.to_numpy())
262-
X = X.reshape(num_samples, 48, -1)
263-
X_ori = X # keep X_ori for validation
264-
X = mcar(X, 0.1) # 随机掩盖观测值的10%, 作为基准数据
265-
dataset = {"X": X} # X用于模型输入
266-
print(X.shape) # X的形状为(11988, 48, 37), 即11988个样本, 每个样本有48个步长(time steps)和37个特征(features)
267-
268-
# 模型训练. PyPOTS的好戏上演了!
269-
from pypots.imputation import SAITS
255+
from pygrinder import mcar, calc_missing_rate
256+
from benchpots.datasets import preprocess_physionet2012
257+
data = preprocess_physionet2012(subset='set-a', rate=0.1) # 我们的工具库会自动下载并解压数据集
258+
train_X, val_X, test_X = data["train_X"], data["val_X"], data["test_X"]
259+
print(train_X.shape) # (n_samples, n_steps, n_features)
260+
print(val_X.shape) # 验证集的样本数与训练集不同(n_samples不同),但样本长度(n_steps)和特征维度(n_features)一致
261+
print(f"训练集 train_X 中缺失值的比例为 {calc_missing_rate(train_X):.1%}")
262+
train_set = {"X": train_X} # 训练集只需包含不完整时间序列
263+
val_set = {
264+
"X": val_X,
265+
"X_ori": data["val_X_ori"], # 验证集中我们需要真实值用于评估和选择模型
266+
}
267+
test_set = {"X": test_X} # 测试集仅提供待填补的不完整时间序列
268+
test_X_ori = data["test_X_ori"] # test_X_ori 包含用于最终评估的真实值
269+
indicating_mask = np.isnan(test_X) ^ np.isnan(test_X_ori) # 生成指示掩码:标记出测试集中人为添加的缺失位置(X中存在缺失但X_ori中不缺失的位置)
270+
271+
from pypots.imputation import SAITS # 导入你想要使用的模型
270272
from pypots.nn.functional import calc_mae
271-
saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=10)
272-
# 因为基准数据对模型不可知, 将整个数据集作为训练集, 也可以把数据集分为训练/验证/测试集
273-
saits.fit(dataset) # 基于数据集训练模型
274-
imputation = saits.impute(dataset) # 插补数据集中原始缺失部分和我们上面人为遮蔽缺失的基准数据部分
275-
indicating_mask = np.isnan(X) ^ np.isnan(X_ori) # 用于计算插补误差的掩码矩阵
276-
mae = calc_mae(imputation, np.nan_to_num(X_ori), indicating_mask) # 计算人为遮掩部分数据的平均绝对误差MAE
277-
saits.save("save_it_here/saits_physionet2012.pypots") # 保存模型
278-
saits.load("save_it_here/saits_physionet2012.pypots") # 你随时可以重新加载保存的模型文件以进行后续的插补或训练
273+
saits = SAITS(n_steps=train_X.shape[1], n_features=train_X.shape[2], n_layers=2, d_model=256, n_heads=4, d_k=64, d_v=64, d_ffn=128, dropout=0.1, epochs=5)
274+
saits.fit(train_set, val_set) # 在数据集上训练模型
275+
imputation = saits.impute(test_set) # 对测试集中原始缺失和人为缺失的值进行填补
276+
mae = calc_mae(imputation, np.nan_to_num(test_X_ori), indicating_mask) # 在人为添加的缺失位置上计算 MAE(对比填补结果与真实值)
277+
saits.save("save_it_here/saits_physionet2012.pypots") # 保存模型供后续使用
278+
saits.load("save_it_here/saits_physionet2012.pypots") # 重新加载模型用于后续填补或继续训练
279279
```
280280

281281
</details>
@@ -505,4 +505,7 @@ Time-Series.AI</a>
505505
*ICLR 2024*.
506506
[^46]: Zhou, T., Niu, P., Sun, L., & Jin, R. (2023).
507507
[One Fits All: Power General Time Series Analysis by Pretrained LM](https://openreview.net/forum?id=gMS6FVZvmF).
508-
*NeurIPS 2023*.
508+
*NeurIPS 2023*.
509+
[^47]: Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024).
510+
[MOMENT: A Family of Open Time-series Foundation Models](https://proceedings.mlr.press/v235/goswami24a.html).
511+
*ICML 2024*.

0 commit comments

Comments
 (0)