recommenders-team · miguelgfierro · May 26, 2025 · May 26, 2025 · May 26, 2025 · May 26, 2025
@@ -187,7 +187,7 @@ The system consists of several components:
 
 ## Credits
 
-This project uses the [Recommenders](https://github.com/microsoft/recommenders) library as a reference for recommendation algorithms.
+This project uses the [Recommenders](https://github.com/recommenders-team/recommenders) library as a reference for recommendation algorithms.
 
 ## License
 

@@ -18,36 +18,39 @@ This action plan will be recommended to run on Azure, please propose an Azure st
 
 **Sections to Produce (exact headers)**
 Based on the suggested candidate recommender algorithms above, please generate the following:
-1. **Candidate Generators** – list 1-3 recall models; include data needed & batch cadence.
-2. **Re-Ranker / Final Model** – architecture, loss, label definition, and latency target.
-3. **Training Stack** – ETL tech, framework (PyTorch Lightning / Spark / LightGBM), HPO strategy, expected hardware.
+1. **Architecture** - select the architecture (batch, real-time, hybrid), justify it with data usage, latency requirements
+2. **Model selection** - select a model within Recommenders library, include features needed, loss, label definition, objective and latency target
+3. **Training Stack** – ETL tech, expected hardware (CPU, GPU or Spark), HPO strategy, training time.
 4. **Serving Path** – storage (Feature Store / Redis / Cosmos DB), ANN / cache layer, model runtime (ONNX-RT, Triton, etc.), P99 latency.
-5. **Metrics & Roll-out** – offline metrics, guard-rail KPIs, A/B or bandit schedule, traffic % ramp.
+5. **Metrics & Roll-out** – offline metrics, online metrics, guard-rail KPIs, A/B or bandit schedule, traffic % ramp.
 6. **Docs & IaC** – artifacts to generate (README, dashboards), IaC spec (Bicep / Terraform), monitoring hooks.
 
 ---
 
 **Output Example - the below is an example ONLY**
 
 ```
-# 1 · Candidate Generators
-· ALS-Spark on last-90-day interactions (nightly)  
-· TF-IDF on product titles & descriptions for cold SKUs (hourly)  
-
-# 2 · Re-Ranker / Final Model
-· xDeepFM combining ID embeddings + price + brand + dense CTR features  
+# 1 · Architecture
+· Hybrid architecture with candidate generation with high recall and real-time reranking with high precision
+· Useful for a large dataset of 20M interactions daily and a latency requirement of <100ms
+· Retraining cadence (e.g., hourly, daily)
+
+# 2 · Model selection
+· SASRec with user-item-interaction together with LightGBM combining text embeddings + price + brand + dense CTR features
+· VW with logistic regression for real-time reranking
 · Objective: binary add-to-cart, AUCPR optimised, 50 ms budget
 
 # 3 · Training Stack
-· Azure Databricks → Delta Lake → PyTorch Lightning on 4×A100  
+· Azure Databricks → Delta Lake → AzureML with multi-GPU server with 4x NVIDIA A100  
 · Hyperdrive Bayesian sweep (LR, layer dims)  
 
 # 4 · Serving Path
 · Feature Store (Redis) → Faiss HNSW recall (≤20 ms)  
-· ONNX-Runtime on AKS for xDeepFM (≤30 ms)  
+· ONNX-Runtime on AKS for real-time (≤30 ms)  
 
 # 5 · Metrics & Roll-out
-· Offline: nDCG@20, MAP@20; Online: ATC rate, GMV lift  
+· Offline: nDCG@20, MAP@20
+· Online: add-to-cart rate, revenue lift  
 · Progressive rollout 1% ➔ 5% ➔ 25% over 2 weeks with sequential testing
 
 # 6 · Docs & IaC
@@ -59,29 +62,27 @@ Fill each section thoroughly; bullet form is fine.
 **Expected JSON Response**
 
 {{
-  "candidate_generators": [
+  "architecture": [
     {{
-      /* Name and flavour of the recall model that surfaces a
-         *large* pool of items (e.g., "Spark-ALS (implicit, 90-day)",
-         "Popularity-biased TF-IDF", "LightGCN Graph Recall").          */
-      "model": "Recommendation model name/type used for generating initial candidates",
+      /* Name and flavour of the architecture (e.g., "Batch architecture",
+         "Real-time architecture", "Hybrid architecture (recall-rerank)").          */
+      "type": "Recommendation architecture used",
 
-      /* All raw sources this generator needs—IDs, interaction logs,
+      /* All raw sources this architecture needs: IDs, interaction logs,
          catalogue metadata, embeddings store paths, etc.
          Mention format + retention window if relevant.                */
-      "data_needed": "Data sources and types required for this generator",
+      "data_needed": "Data sources and types required for this architecture",
 
-      /* How often the recall index or scores are refreshed.
+      /* How often the models are retrained.
          Typical values: 'hourly', 'nightly', 'near-real-time (Kafka stream)'. */
-      "batch_cadence": "How frequently this generator runs (e.g., hourly, daily)"
+      "retraining_cadence": "How frequently this generator runs (e.g., hourly, daily)"
     }}
   ],
 
-  "reranker": {{
-    /* Architecture that scores the relatively small candidate set
-       returned by the generator(s)—e.g., 'xDeepFM', 'Transformer
-       cross-encoder', 'Wide & Deep + price/brand features'.            */
-    "architecture": "Model architecture used for reranking candidates",
+  "models": {{
+    /* Algorithm selection from Recommeders library—e.g., 'xDeepFM', 'Transformer
+       cross-encoder', 'Wide & Deep + price/brand features', 'SASRec', 'BPR'.            */
+    "algorithms": "Algorithm or algorithm combinations",
 
     /* Training objective—point-wise BCE, pair-wise hinge, listwise
        LambdaRank, AUCPR, etc.                                          */
@@ -101,16 +102,12 @@ Fill each section thoroughly; bullet form is fine.
        'PySpark → Delta Lake', 'Azure Data Factory', 'dbt + Snowpark', etc. */
     "etl": "Data extraction, transformation, loading technology",
 
-    /* Core model-training framework(s):
-       'PyTorch Lightning', 'LightGBM-CLI', 'TensorFlow-Keras', etc.    */
-    "framework": "ML framework used for model development",
-
     /* Hyper-parameter search strategy:
        'AzureML HyperDrive (Bayesian)', 'Optuna + ASHA', 'manual grid'. */
     "hpo_strategy": "Hyperparameter optimization approach",
 
     /* Compute requested for a *single* full training run—
-       node type / count / GPUs, or 'CPU-only', etc.                    */
+       node type / count / GPUs, Spark, or 'CPU-only', etc.                    */
     "hardware": "Compute resources required for training"
   }},
 
@@ -138,7 +135,7 @@ Fill each section thoroughly; bullet form is fine.
     "offline_metrics": ["Metrics used to evaluate model quality offline"],
 
     /* Business KPIs tracked in production:
-       ['CTR','GMV','Watch Time','Add-to-Cart Rate'], etc.              */
+       ['CTR','GMV','Watch Time','Add-to-Cart Rate', 'ARPU'], etc.              */
     "online_kpis": ["Business KPIs monitored during deployment"],
 
     /* Roll-out pattern:

@@ -12,12 +12,11 @@ Output a short, ordered list (`ranked_algos`) plus a 2-sentence rationale for ea
 **Table A – Canonical Algorithms**
 
 ```
-• Explicit-rating CF  ……  ALS, Surprise-SVD, LightFM, Wide&Deep, xDeepFM  
+• Explicit-rating CF  ……  ALS, Surprise-SVD, LightFM, Wide&Deep, xDeepFM, SAR
 • Implicit-feedback CF ……  BPR, SAR, LightGCN, NCF, RBM, ALS-implicit, xDeepFM, LightFM  
 • Sequential / session  ……  SASRec, Caser, GRU, NextItNet, A2SVD, SLi-Rec, SUM, SSEPT  
 • Content / hybrid      ……  LightGBM-GBT, TF-IDF, DKN, NAML, NPA, NRMS, LSTUR, VW, Wide&Deep, xDeepFM, LightFM  
 • Generative / VAE      ……  BiVAE, Multinomial-VAE, Standard-VAE  
-• Geo / Riemannian      ……  GeoIMC, RLRMC  
 ```
 
 ---
@@ -26,32 +25,37 @@ Output a short, ordered list (`ranked_algos`) plus a 2-sentence rationale for ea
 
 1. **Interaction\_Type**
 
-   * `explicit_ratings`
+   * `explicit_ratings` (ratings)
    * `implicit_events` (clicks, views, purchases)
    * `sequential_events` (time-ordered)
    * `content_driven` (news, ads, cold items)
 2. **Cold\_Start\_Severity** (`high` | `medium` | `low`)
-3. **Scale\_and\_Latency**
+3. **Architecture**
 
-   * `1 B events` → prefer Spark/graph models (ALS-Spark, LightGCN)
-   * `online_<100 ms` → prefer lightweight or two-stage (TF-IDF, VW, SAR similarities)
-4. **Business\_Metric** (`CVR` | `WatchTime` | `ARPU` | `Engagement`)
-5. **Regulatory\_or\_Explainability\_Needed** (`yes` | `no`)
-6. **Available\_Features** (`ids_only` | `rich_context` | `text_KG` | `geo`)
-7. **Compute\_Budget** (`cpu_only` | `gpu_ok` | `distributed_spark`)
+   * `batch` → lowest latency (5ms-20ms), cheaper, trained once a day (it doesn't capture real-time interactions), look-up to a quick database (Redis, SQL Server)
+   * `real_time` → high latency (100ms-200ms), model deployed and score in real-time
+   * `hybrid` → Also called recall-rerank architecture or 2 step recommender, mid latency (20ms-100ms), candidate generation, real-time reranking
+4. **Scale\_and\_Latency**
+
+   * `1 B events` → prefer Spark or lightweight models (ALS-Spark, SARplus, xLearn)
+   * `<20ms`  → prefer batch with heavy models (SASRec, BiVAE, LightGCN)
+   * `online_<100 ms` →  prefer hybrid architecture with heavy models for candidate generation with high recall (SASRec, BiVAE, LightGCN) and lightweight models for real-time reranking (VW, LightGBM)
+5. **Business\_Metric** (`CVR` | `CTR` | `WatchTime` | `ARPU` | `Engagement`)
+6. **Regulatory\_or\_Explainability\_Needed** (`yes` | `no`)
+7. **Available\_Features** (`ids_only` | `rich_context` | `text_KG` )
+8. **Compute\_Budget** (`cpu_only` | `gpu_ok` | `distributed_spark`)
 
 ---
 
 **Selection Rules (compressed)**
 
-* *Explicit + ids\_only* → ALS / SVD.
+* *Explicit + ids\_only* → ALS, SVD, SAR (similarity).
 * *Implicit + ids\_only* → BPR (pairwise), SAR (similarity), LightGCN (graph), NCF (deep).
-* *Sequential* → Transformer (SASRec/SSEPT) if gpu\_ok, else Caser/GRU.
+* *Sequential* → Transformer (SASRec/SSEPT) or Caser/GRU if gpu\_ok, else LightGBM.
 * *Content\_driven & high\_cold\_start* → LightGBM-GBT, TF-IDF, DKN/NRMS/LSTUR family.
 * *Need\_CVR / ROI* → point-wise GBT (LightGBM) or xDeepFM / Wide\&Deep.
 * *Need\_Interpretability* → GBT, TF-IDF, attention-based (DKN, NRMS), VW with per-feature weights.
-* *Geo features* → GeoIMC, RLRMC.
-* *Massive\_scale* → Spark-ALS, SAR-Plus, LightGCN with mini-batch sampling.
+* *Massive\_scale* → Spark-ALS, SAR-Plus, LightGBM-Spark, LightGCN with mini-batch sampling.
 * *Real-time bandit / low\_latency* → pre-compute candidates + VW or LightGBM re-rank.
 
 ---