Skip to content

machinelearningnuremberg/RegularizedNeuralEnsemblersBenchmark

Repository files navigation

Regularized Neural Ensemblers

  PyTorch Version   Open Source

Introduction

Ensemble methods can significantly enhance the accuracy and robustness of machine learning models by combining multiple base learners. However, standard approaches like greedy or random ensembling often assume a constant weighting for each base model, which can limit expressiveness.

This repository explores dynamic neural ensemblers, where a neural network adaptively aggregates predictions from multiple candidate models. To address overfitting and low-diversity ensembles, we propose a simple but effective regularization strategy by randomly dropping base model predictions during training, ensuring a lower bound on ensemble diversity.

This repo contains details experiments from this original paper. For a simple, easy-to-run version, please access this other repository

Installation

We use conda for environment management and Poetry for dependency installation.

1. Optional: Install miniconda and create an environment

# Download & install Miniconda (adjust for your operating system)
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O install_miniconda.sh
bash install_miniconda.sh -b -p $HOME/.conda
rm install_miniconda.sh

# Initialize conda for your shell (e.g., bash or zsh)
~/.conda/bin/conda init
# Then restart your shell or source your rc file

# Create and activate the environment
conda create -n searching_optimal_ensembles python=3.10
conda activate searching_optimal_ensembles

2. Install poetry

curl -sSL https://install.python-poetry.org | python3 -
# Add Poetry to your PATH in ~/.bashrc or ~/.zshrc
export PATH="$HOME/.local/bin:$PATH"```

Consider appending export PATH="$HOME/.local/bin:$PATH" into ~/.zshrc / ~/.bashrc.

3. Install dependencies

bash setup.sh

This will install all dependencies into the Poetry environment.

Note: Due to the scikit-learn mismatch, the default installation support tabrepo and phem libraries; for running experiments with Pipeline-Bench metadataset using TPOT search space, install the dependencies with bash setup.sh install-pipeline_bench.

Usage Example

Below are minimal examples for random, greedy, and neural ensemble methods. Each script showcases how to load the metadataset, sample (or build) ensembles, and evaluate performance. Adjust paths (e.g., DATA_DIR) as needed.

Random Ensembling

# SearchingOptimalEnsembles_experiments/random_ensemble_example.py
# Demonstrates how to build a random ensemble on a metadataset.

import torch

from SearchingOptimalEnsembles.posthoc.random_ensembler import RandomEnsembler
import SearchingOptimalEnsembles.metadatasets.quicktune.metadataset as qmd

if __name__ == "__main__":
    data_version = "micro"
    metric_name = "nll"
    task_id = 0  # or any valid index
    DATA_DIR = "path/to/quicktune/predictions"

    metadataset = qmd.QuicktuneMetaDataset(
        data_dir=DATA_DIR, metric_name=metric_name, data_version=data_version
    )
    dataset_names = metadataset.get_dataset_names()
    metadataset.set_state(dataset_names[0])

    # Initialize random sampler
    ensembler = RandomEnsembler(
        metadataset=metadataset,
        device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
    )

    # Candidate pipelines (in practice, you'd sample or load these)
    X_obs = [[1], [2], [3], [4], [5], [6], [7], [8]]

    best_ensemble, best_metric = ensembler.sample(X_obs)
    print("Best random ensemble found:", best_ensemble)
    print("Random ensembler metric:", best_metric)

Run the script with python SearchingOptimalEnsembles_experiments/random_ensemble_example.py.

Greedy Ensembling

# SearchingOptimalEnsembles_experiments/greedy_ensemble_example.py
# Demonstrates how to build a greedy ensemble on a metadataset.

import torch

from SearchingOptimalEnsembles.posthoc.greedy_ensembler import GreedyEnsembler
import SearchingOptimalEnsembles.metadatasets.quicktune.metadataset as qmd

if __name__ == "__main__":
    data_version = "micro"
    metric_name = "nll"
    task_id = 0  # or any valid index
    DATA_DIR = "path/to/quicktune/predictions"

    metadataset = qmd.QuicktuneMetaDataset(
        data_dir=DATA_DIR, metric_name=metric_name, data_version=data_version
    )

    dataset_names = metadataset.get_dataset_names()
    metadataset.set_state(dataset_names[0])

    # Initialize greedy ensembler
    ensembler = GreedyEnsembler(
        metadataset=metadataset,
        device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
    )

    # Candidate pipelines (in practice, you'd sample or load these)
    X_obs = [[1], [2], [3], [4], [5], [6], [7], [8]]

    best_ensemble, best_metric = ensembler.sample(X_obs)
    print("Greedy ensemble found:", best_ensemble)
    print("Greedy ensemble metric:", best_metric)

Run the script with python SearchingOptimalEnsembles_experiments/greedy_ensemble_example.py.

Neural Ensembling

# SearchingOptimalEnsembles_experiments/neural_ensemble_example.py
# Demonstrates how to train a neural ensemble on a metadataset.

import torch

from SearchingOptimalEnsembles.posthoc.neural_ensembler import NeuralEnsembler
import SearchingOptimalEnsembles.metadatasets.quicktune.metadataset as qmd

if __name__ == "__main__":
    data_version = "micro"
    metric_name = "nll"
    task_id = 0  # or any valid index
    DATA_DIR = "path/to/quicktune/predictions"

    metadataset = qmd.QuicktuneMetaDataset(
        data_dir=DATA_DIR, metric_name=metric_name, data_version=data_version
    )

    dataset_names = metadataset.meta_splits["meta-test"]
    metadataset.set_state(dataset_names[task_id])

    # Initialize the neural ensembler
    neural_ensembler = NeuralEnsembler(
        metadataset=metadataset,
        device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
    )

    # Candidate pipelines (in practice, you'd sample or load these)
    X_obs = [[1], [2], [3], [4], [5], [6], [7], [8]]

    # Now sample an ensemble with learned dynamic weights
    best_ensemble, best_metric = neural_ensembler.sample(X_obs)
    weights = neural_ensembler.get_weights(X_obs)

    _, metric_val, _, _ = metadataset.evaluate_ensembles_with_weights(
        ensembles=[best_ensemble], weights=weights
    )
    print("Best ensemble found by Neural Ensembler:", best_ensemble)
    print("Neural ensemble metric:", metric_val.item())

Run the script with python SearchingOptimalEnsembles_experiments/neural_ensemble_example.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •