Feature/pytorch embedding ranker v4 #2228

demoncoder-crypto · 2025-05-02T15:15:19Z

Description

Related Issues

References

Checklist:

I have followed the contribution guidelines and code style for this project.
I have added tests covering my contributions.
I have updated the documentation accordingly.
I have signed the commits, e.g. git commit -s -m "your commit message".
This PR is being made to staging branch AND NOT TO main branch.

review-notebook-app · 2025-05-02T15:15:24Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

miguelgfierro · 2025-05-03T08:08:22Z

examples/00_quick_start/nn_embedding_ranker_movielens.ipynb

@@ -0,0 +1,10 @@
+{


not sure if there was an error in the submission of the notebook, but I can't see it

miguelgfierro · 2025-05-03T08:08:42Z

recommenders/evaluation/spark_evaluation.py

 # Licensed under the MIT License.

 import numpy as np
+import warnings # Added for R-Precision warning


this file is still from the other PR, it needs to be removed

miguelgfierro · 2025-05-03T08:16:34Z

recommenders/models/embedding_ranker/embedding_ranker_utils.py

+# Copyright (c) Recommenders contributors.
+# Licensed under the MIT License.
+
+import os
+import numpy as np
+import pandas as pd
+import torch
+
+from recommenders.utils.constants import (
+    DEFAULT_USER_COL,
+    DEFAULT_ITEM_COL,
+    DEFAULT_RATING_COL,
+    DEFAULT_PREDICTION_COL,
+    DEFAULT_K,
+)
+
+def predict_rating(
+    model,
+    test_df,
+    col_user=DEFAULT_USER_COL,
+    col_item=DEFAULT_ITEM_COL,
+    col_rating=DEFAULT_RATING_COL,
+    col_prediction=DEFAULT_PREDICTION_COL,
+    batch_size=1024,
+):
+    """Predict ratings for user-item pairs in test data.
+
+    Args:
+        model (NNEmbeddingRanker): Trained embedding ranker model.
+        test_df (pandas.DataFrame): Test dataframe containing user-item pairs.
+        col_user (str): User column name.
+        col_item (str): Item column name.
+        col_rating (str): Rating column name.


I think all this can become the notebook.

Some thoughts for the notebook:

Here is a good example of a useful notebook: https://github.com/recommenders-team/recommenders/blob/main/examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb it explains both the math behind it and an implementation

Think of the notebook as a way to showcase how to use the ranker and what is the ranker about.

The objective of the notebook is that it needs to be useful. That's the most important metric.

Ideally, a person could go to this notebook, add their data, run it, and understand how to use it.

For the notebook, you can just showcase the content of these functions directly inside the notebook.

Something very important is that we follow the principle of explicit is better than implicit. For example, we don't something like for metric_name, metric_func in metrics.items(), because it adds a layer of complexity. Instead we show the metrics explicitely: rmse(true, pred), precision_at_k(true, pred, params), etc. Each person with a quick view can see what is going on.

Feel free to come to our Monday meeting if you want to understand better how we do the notebooks @demoncoder-crypto

loomlike · 2025-05-13T03:14:54Z

recommenders/evaluation/spark_evaluation.py

        self.col_rating = col_rating
        self.col_prediction = col_prediction
        self.threshold = threshold
+        self.rating_pred_raw = rating_pred # Store raw predictions before processing


rating_pred is already stored in self.rating_pred, and self.rating_pred_raw is not used in this class. Any reason you want to store?

loomlike · 2025-05-13T03:16:21Z

recommenders/evaluation/spark_evaluation.py

            introducing serendipity into music recommendation, WSDM 2012

-            Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems,
+            Eugene Yan, Serendipity's unpopular best friend in Recommender Systems,


The original reference sentence is correct. https://eugeneyan.com/writing/serendipity-and-accuracy-in-recommender-systems/

loomlike · 2025-05-13T03:22:17Z

recommenders/models/embedding_ranker/embedding_ranker_utils.py

+    all_pairs = []
+    for user in valid_users:
+        for item in all_items:
+            all_pairs.append((user, item))


all_pairs = [(u, i) for u in valid_users for i in all_items]
or

from itertools import product all_pairs = list(product(valid_users, all_items))

loomlike · 2025-05-13T03:24:27Z

recommenders/models/embedding_ranker/embedding_ranker_utils.py

+        # Filter out seen pairs
+        result_df = result_df[~result_df.apply(lambda row: (row[col_user], row[col_item]) in seen_pairs, axis=1)]
+
+    # Get top-k recommendations for each user


can you reuse predict_rating if generating_recommendation is using the same logic under the hood and sort & cut top_k in the end?

loomlike · 2025-05-13T03:26:07Z

recommenders/models/embedding_ranker/embedding_ranker_utils.py

+    # Calculate metrics
+    results = {}
+    for metric_name, metric_func in metrics.items():
+        # Different metrics may have different required parameters


if the only difference is k, you may:

results[metric_name] = metric_func( test_df, predictions_df, col_user=col_user, col_item=col_item, col_rating=col_rating, col_prediction=col_prediction, k=k if 'k' in metric_func.__code__.co_varnames else None, )

miguelgfierro · 2025-05-15T10:45:50Z

@setuc FYI

miguelgfierro · 2025-05-20T08:03:29Z

@demoncoder-crypto how is this work going?
@setuc can support

miguelgfierro · 2025-07-21T06:27:44Z

@jmarrietar do you think you would be able to take over this work? It is very similar to embdotbias

jmarrietar · 2025-07-28T12:13:22Z

Hi, @miguelgfierro . I'll be on a tight schedule for the following months. But I can take a look when I free up a little bit 😄 .

demoncoder-crypto added 3 commits April 6, 2025 22:48

feat: Implement R-Precision metric for Spark

a4a00fa

feat: Add embedding ranker model using PyTorch

f0a959a

Apply collaborator feedback and remove r_precision code

9969d75

demoncoder-crypto requested review from SimonYansenZhao, anargyri, gramhagen, loomlike, miguelgfierro and wav8k as code owners May 2, 2025 15:15

miguelgfierro reviewed May 3, 2025

View reviewed changes

loomlike reviewed May 13, 2025

View reviewed changes

miguelgfierro closed this Jul 19, 2025

miguelgfierro reopened this Jul 21, 2025

Feature/pytorch embedding ranker v4 #2228

Are you sure you want to change the base?

Feature/pytorch embedding ranker v4 #2228

Uh oh!

Conversation

demoncoder-crypto commented May 2, 2025

Description

Related Issues

References

Checklist:

Uh oh!

review-notebook-app bot commented May 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miguelgfierro commented May 15, 2025

Uh oh!

miguelgfierro commented May 20, 2025

Uh oh!

miguelgfierro commented Jul 21, 2025

Uh oh!

jmarrietar commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants