You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-7Lines changed: 28 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,9 @@ This repository provides a modular template for building recommender systems in
21
21
22
22
### 📦 Dataset
23
23
24
-
As an example, this template uses the [ContentWise Impressions](https://github.com/ContentWise/contentwise-impressions) dataset, which contains real-world implicit feedback data.
24
+
As an example, this template uses the [ContentWise Impressions](https://github.com/ContentWise/contentwise-impressions) dataset - a collection of implicit interactions and impressions of movies and TV series from an Over-The-Top media service, which delivers its media contents over the Internet. ***In the preprocessing phase the dataset is being limited to content of movies only.***
25
+
26
+
Exporatory data analysis can be found in [contentwise_eda.ipynb](notebooks/contentwise_eda.ipynb).
25
27
26
28
### 🚀 Use Cases
27
29
@@ -44,12 +46,12 @@ To make use of this repository, follow these steps:
44
46
45
47
2.**Set up external services**
46
48
- Configure your connection to a ClearML server for experiment tracking.
47
-
- (Optional) Set up access to AWS S3 if you want to use remote storage for data or models.
49
+
- (Optional) Set up access to AWS S3 if you want to use remote storage for data or/and models.
48
50
49
51
50
52
## Configuration and installation
51
53
52
-
Prepare environment variables in .env (see .env.example):
54
+
Prepare environment variables related to ClearML and AWS in .env (see .env.example):
53
55
```
54
56
CLEARML_CONFIG_FILE=clearml.conf
55
57
CLEARML_WEB_HOST=<your-clearml-web-host>
@@ -67,22 +69,41 @@ Install with pip:
67
69
pip install .# Add flag -e to install in editable mode
68
70
```
69
71
70
-
Using docker compose:
72
+
(Optional) Using docker compose:
71
73
```bash
72
74
docker compose up -d # Run container based on docker-compose.yml
73
75
```
74
76
75
-
Using plain docker:
77
+
(Optional) Using plain docker:
76
78
```bash
77
79
docker build -t ds-image .# Build image defined in Dockerfile
78
80
docker run -dit --gpus all --name ds-container ds-image # Run container based on that image
79
81
```
80
82
81
-
## Quick start
83
+
## Run pipeline steps
84
+
85
+
### 1. Data preparation
82
86
83
87
```bash
84
88
python steps/process_data.py
85
-
python steps/compute_baseline.py
89
+
```
90
+
91
+
After running this script the following datasets are being generated:
92
+
-`train.parquet` - behavioral data about 'movies consumption' for training (implict feedback)
93
+
-`validation.parquet` - behavioral data for validation
94
+
-`user_mapper.parquet` - user name to user index mapper
95
+
-`item_mapper.parquet` - item name to item index mapper
96
+
-`last_user_histories.parquet` - histories of last *n* consumed item per user - computed on train data
0 commit comments