🚀⚡🔥 Recommender System Template 🚀⚡🔥

This repository provides a modular template for building recommender systems in Python using implicit feedback data. It is designed to streamline experimentation of recommendation models with a modern ML stack. Two neural-based models are implemented: Matrix Factorization and MLP based on one of two user representations: user represented by its embedding or user represented by histories of clicked items (item embeddings).

🔧 Tech Stack

PyTorch Lightning – for scalable and structured model training
Hydra – for flexible configuration management
ClearML – for experiment tracking and ML workflow orchestration
(Optional) AWS S3 – for storing datasets and models remotely

📦 Dataset

As an example, this template uses the ContentWise Impressions dataset - a collection of implicit interactions and impressions of movies and TV series from an Over-The-Top media service, which delivers its media contents over the Internet. In the preprocessing phase the dataset is being limited to content of movies only.

Exporatory data analysis can be found in contentwise_eda.ipynb.

🚀 Use Cases

Rapid prototyping of recommender systems
Benchmarking implicit models
Educational purposes (learning modern ML tools in practice)

More details about setup, usage, and customization can be found in the sections below.

Prerequisites

To make use of this repository, follow these steps:

Download the dataset
Download the ContentWise Impressions dataset, specifically the CW10M directory.
Place it in the following path: cache/data-cw10m/
Set up external services

Configure your connection to a ClearML server for experiment tracking.
(Optional) Set up access to AWS S3 if you want to use remote storage for data or/and models.

Configuration and installation

Prepare environment variables related to ClearML and AWS in .env (see .env.example):

CLEARML_CONFIG_FILE=clearml.conf
CLEARML_WEB_HOST=<your-clearml-web-host>
...

Create and activate virtual environment with conda:

conda create --name <env_name> python=3.13.2
conda activate <env_name>

Install with pip:

pip install .  # Add flag -e to install in editable mode

(Optional) Using docker compose:

docker compose up -d  # Run container based on docker-compose.yml

(Optional) Using plain docker:

docker build -t ds-image .  # Build image defined in Dockerfile 
docker run -dit --gpus all --name ds-container ds-image  # Run container based on that image

Run pipeline steps

1. Data processing

python steps/process_data.py

After running this script the following datasets are being generated:

train.parquet - behavioral data about 'movies consumption' for training (implict feedback)
validation.parquet - behavioral data for validation
user_mapper.parquet - user name to user index mapper
item_mapper.parquet - item name to item index mapper
last_user_histories.parquet - histories of last n consumed item per user - computed on train data

2. Baselines evalution

python steps/evaluate_baselines.py

Offline metrics (AUROC & NDCG) of baselines solutions:

3. Train

Training MLP based on user histories for 20 epochs:

python steps/train.py experiment=mlp_with_history trainer.max_epochs=20

4. Rest of the steps

python steps/optimize_hparams.py
python steps/infer.py
python steps/serve.py
python steps/run_pipeline.py

Other useful commands:

docker exec -it ds-container bash  # Execute bash in a running container
docker compose start/stop/down
docker builder prune  # Remove build cache

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
.github/workflows		.github/workflows
configs		configs
notebooks		notebooks
src/mypackage		src/mypackage
static		static
steps		steps
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
clearml.conf		clearml.conf
code_prettifier.sh		code_prettifier.sh
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀⚡🔥 Recommender System Template 🚀⚡🔥

🔧 Tech Stack

📦 Dataset

🚀 Use Cases

Prerequisites

Configuration and installation

Run pipeline steps

1. Data processing

2. Baselines evalution

3. Train

4. Rest of the steps

Other useful commands:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

krystianfranus/recommender-template

Folders and files

Latest commit

History

Repository files navigation

🚀⚡🔥 Recommender System Template 🚀⚡🔥

🔧 Tech Stack

📦 Dataset

🚀 Use Cases

Prerequisites

Configuration and installation

Run pipeline steps

1. Data processing

2. Baselines evalution

3. Train

4. Rest of the steps

Other useful commands:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages