Evaluating Time Series Models for Urban Wastewater Management: Predictive Performance, Model Complexity and Resilience
Presented at the 10th International Conference on Smart and Sustainable Technologies (SpliTech 2025)
Authors: Vipin Singh, Tianheng Ling, Teodor Chiaburu, Felix Biessmann
Climate change increases the frequency of extreme rainfall, placing a significant strain on urban infrastructures, especially Combined Sewer Systems (CSS). Overflows from overburdened CSS release untreated wastewater into surface waters, posing environmental and public health risks. Although traditional physics-based models are effective, they are costly to maintain and difficult to adapt to evolving system dynamics. Machine Learning (ML) approaches offer cost-efficient alternatives with greater adaptability. To systematically assess the potential of ML for modeling urban infrastructure systems, we propose a protocol for evaluating Neural Network architectures for CSS time series forecasting with respect to predictive performance, model complexity, and robustness to perturbations. In addition, we assess model performance on peak events and critical fluctuations, as these are the key regimes for urban wastewater management. To investigate the feasibility of lightweight models suitable for IoT deployment, we compare global models, which have access to all information, with local models, which rely solely on nearby sensor readings. Additionally, to explore the security risks posed by network outages or adversarial attacks on urban infrastructure, we introduce error models that assess the resilience of models.
Our results demonstrate that while global models achieve higher predictive performance, local models provide sufficient resilience in decentralized scenarios, ensuring robust modeling of urban infrastructure. Furthermore, models with longer native forecast horizons exhibit greater robustness to data perturbations. These findings contribute to the development of interpretable and reliable ML solutions for sustainable urban wastewater management.
- Comparison of 6 Neural Network architectures for time series forecasting
- Establishing global and local models for resliency against network outages:
- Global models: Access to all sensor data
- Local models: Only use nearby sensor readings
- Robustness analysis of models against realistic errors:
- Outliers: e. g. sensor miscalibration
- Missing Values: e. g. mainetenance or network outages
- Clipping: e. g. physical limitations of sensors
- Evaluation of model performance on peak events and critical fluctuations
- Holistic evaluation of model performance, complexity, and resilience
-
Clone the repository:
git clone ...
-
Create a virtual environment (recommended):
# Using venv python -m venv env source env/bin/activate # On Windows use `env\Scripts\activate`
-
Install dependencies:
# Using pip pip install -r requirements.txt
Note: The key dependency for running the models is PyTorch with version 2.2.2.
For carrying out the experiments and training our models we disposed of a real-world dataset of a Combined Sewer System in the city of Duisburg, Germany, provided by the Wirtschaftsbetriebe Duisburg (WBD).
The full dataset cannot be made publicly available, because of information on critical infrastructure.
For further details on the dataset, please refer to the paper.
The code can be used through running the main.py
script, which allows for training and inference of time series models.
To see all available command line arguments, run:
python main.py --help
To train a model, use the following command:
python main.py --data_filename=<path_to_data_csv> --target=<target_column> --future_compatible_covariate=<list_of_future_compatible_covariates> --model_type=<model_type>
Where:
<path_to_data_csv>
: Path to the CSV file containing the time series data.<target_column>
: The column in the CSV file that contains the target variable to be predicted.<list_of_future_compatible_covariates>
: A comma-separated list of covariates that are known in the future.<model_type>
: The type of model to be trained (e.g.,tft
,transformer
,lstm
,nhits
,tcn
,deepar
).
To evaluate a trained model, use the following command:
python main.py --data_filename=<path_to_data_csv> --target=<target_column> --future_compatible_covariate=<list_of_future_compatible_covariates> --model_type=<model_type> --inference_model_path=<path_to_model>
Where:
<path_to_data_csv>
: Path to the CSV file containing the time series data.<target_column>
: The column in the CSV file that contains the target variable to be predicted.<list_of_future_compatible_covariates>
: A comma-separated list of covariates that are known in the future.<model_type>
: The type of model to be trained (e.g.,tft
,transformer
,lstm
,nhits
,tcn
,deepar
).<path_to_model>
: Path to the trained model file that you want to evaluate.
Here, we provide a summary of the results obtained from the experiments conducted with the different time series models on the dataset from the CSS in Duisburg, Germany. For the visualization and the discussion of the results, please refer to the paper.
Model type | MSE (q=0.25) Global | MSE (q=0.25) Local | MSE (median) Global | MSE (median) Local | MSE (q=0.75) Global | MSE (q=0.75) Local | Median MSE at Peak Events Global | Median MSE at Peak Events Local | Inference time [ms] Global | Inference time [ms] Local |
---|---|---|---|---|---|---|---|---|---|---|
TFT | 0.28 | 0.48 | 0.30 | 0.50 | 0.34 | 0.53 | 0.67 | 1.23 | 2.36 | 0.94 |
Transformer | 0.60 | 0.62 | 0.61 | 0.63 | 0.61 | 0.64 | 1.38 | 1.41 | 0.87 | 0.88 |
LSTM | 0.51 | 0.63 | 0.64 | 0.79 | 0.83 | 0.99 | 1.15 | 1.42 | 0.81 | 0.81 |
N-HiTS | 0.67 | 0.48 | 0.68 | 0.48 | 0.69 | 0.49 | 1.43 | 1.23 | 0.85 | 0.83 |
TCN | 0.97 | 1.00 | 0.98 | 1.01 | 0.99 | 1.03 | 1.90 | 1.97 | 0.84 | 0.82 |
DeepAR | 1.14 | 1.28 | 1.31 | 1.45 | 1.56 | 1.64 | 2.07 | 2.13 | 0.88 | 0.88 |
The repository is structured as follows (only relevant files displayed):
main.py
: The main script and entry point for running experiments.requirements.txt
: Lists the Python packages required to run the code.README.md
: Provides an overview of the project, setup instructions, and how to run the experiments.data/
: Contains scripts for data loading, processing, and exploratory data analysis (EDA).TimeSeriesDatasetCreator.py
: Creates the time series dataset for the experiments.VierlindenDataProcessor.py
: Processes the specific "Vierlinden" dataset.eda/
: Holds notebooks and reports from the exploratory data analysis phase.
models/
: Includes modules for building and loading the forecasting models.build_model.py
: Constructs the different time-series models (e.g., DeepAR, LSTM, TCN).load_model.py
: Loads pre-trained models for evaluation or inference.
utils/
: A collection of helper scripts for the core logic of the experiments.ErrorGeneration.py
: Generates different types of errors (e.g., outliers, missing values) to test model resilience.ExperimentRunner.py
: Manages the execution of the entire experimental workflow.HyperparameterOptimizer.py
: Handles the hyperparameter optimization (HPO) process.ModelTrainer.py
: Contains the logic for training the models.ModelEvaluator.py
: Evaluates model performance using various metrics.
args_files/
: Stores configuration and argument files for different experimental setups.best_hp/
: Contains the best hyperparameter configurations found for each model, separated byglobal
andlocal
scenarios.hpo_args_files/
: Arguments for running hyperparameter optimization sweeps.errorgen_exp/
: Scripts and arguments for running the error generation experiments.
hpo_configs/
: YAML configuration files for the hyperparameter optimization sweeps for each model.archives/
: Contains notebooks and detailed analyses from various experimental stages.errorgen_analysis/
: In-depth analysis of the error generation experiments, including plots and explanations of different error types.wandb_visualizations/
: Notebooks and results related to visualizing experiment data.