Feature/pytorch embedding ranker #2220

demoncoder-crypto · 2025-04-07T13:19:40Z

Fixes #2205: [FEATURE] Add embedding ranker in PyTorch
References
TensorFlow Recommenders basic ranking example: https://www.tensorflow.org/recommenders/examples/basic_ranking
PyTorch documentation: https://pytorch.org/docs/stable/index.html
Checklist:
[x] I have followed the contribution guidelines and code style for this project.
[x] I have added tests covering my contributions (tests included in model implementation).
[x] I have updated the documentation accordingly (added docstrings).
[ ] I have signed the commits, e.g. git commit -s -m "your commit message".
[x] This PR is being made to staging branch AND NOT TO main branch.

review-notebook-app · 2025-04-07T13:19:46Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

miguelgfierro

This is super good, congrats for the great work @demoncoder-crypto

I have some comments

miguelgfierro · 2025-04-08T17:13:27Z

recommenders/models/embedding_ranker/nn_embedding_ranker.py

+        logger.info(f"Number of unique items: {self.n_items}")
+
+        # Create mapped dataframes
+        train_mapped_df = train_df.copy()


do we really need this? if the data is large, this could create a large overhead?

miguelgfierro · 2025-04-08T17:22:30Z

recommenders/models/embedding_ranker/embedding_ranker_utils.py

+        pandas.DataFrame: Dataframe with user, item, prediction columns.
+    """
+    # Create a copy of the test data with only the needed columns
+    test_copy = test_df[[col_user, col_item]].copy()


every copy adds a lot of overhead, let's try to avoid it

miguelgfierro · 2025-04-08T17:24:03Z

recommenders/models/embedding_ranker/nn_embedding_ranker.py

+        # Generate all possible user-item pairs for prediction
+        user_item_pairs = []
+        users = []
+        items = []
+        for user in test_users:
+            for item in all_items:
+                users.append(user)
+                items.append(item)


double loop can be made faster with itertools:

from itertools import product users, items = zip(*product(test_users, all_items)) users = list(users) items = list(items)

miguelgfierro · 2025-04-08T17:24:55Z

tests/unit/recommenders/evaluation/test_spark_evaluation.py

+
+
+@pytest.mark.spark
+def test_spark_r_precision(spark_data):
+    df_true, df_pred = spark_data
+
+    # Test perfect prediction (R-Precision should be 1.0)
+    evaluator_perfect = SparkRankingEvaluation(df_true, df_true, col_prediction="rating")
+    assert evaluator_perfect.r_precision() == pytest.approx(1.0, TOL)
+
+    # Test with sample prediction data
+    evaluator = SparkRankingEvaluation(df_true, df_pred)
+    # Expected value calculation:
+    # User 1: R=3 relevant items (1, 2, 3). Top 3 predictions: (1, 0.8), (5, 0.6), (2, 0.4). Relevant in top 3: (1, 2). R-Prec = 2/3
+    # User 2: R=2 relevant items (1, 4). Top 2 predictions: (1, 0.9), (4, 0.7). Relevant in top 2: (1, 4). R-Prec = 2/2 = 1.0
+    # User 3: R=1 relevant item (2). Top 1 prediction: (2, 0.7). Relevant in top 1: (2). R-Prec = 1/1 = 1.0
+    # Mean R-Precision = (2/3 + 1.0 + 1.0) / 3 = (0.6666... + 1 + 1) / 3 = 2.6666... / 3 = 0.8888...
+    expected_r_precision = (2/3 + 1.0 + 1.0) / 3
+    assert evaluator.r_precision() == pytest.approx(expected_r_precision, TOL)


this is great, can you add it to the other PR: #2219

miguelgfierro · 2025-04-08T17:25:25Z

recommenders/evaluation/spark_evaluation.py

        """
        return self._metrics.meanAveragePrecisionAt(self.k)

+    def r_precision(self):


could you please separate this into the other PR #2219

miguelgfierro · 2025-04-08T17:27:15Z

examples/00_quick_start/nn_embedding_ranker_movielens.ipynb

@@ -0,0 +1 @@
+


it would be great to have the notebook accompanying this code

miguelgfierro · 2025-04-08T17:28:46Z

@anargyri @SimonYansenZhao @loomlike can you please review?

demoncoder-crypto · 2025-04-08T18:57:40Z

Tysm @miguelgfierro. I have pushed the jupyter notebook code somehow it remained in local or stash. after all the reviews are complete i will correct all the things and finally push all the code together.

miguelgfierro · 2025-04-10T09:04:51Z

Tysm @miguelgfierro. I have pushed the jupyter notebook code somehow it remained in local or stash. after all the reviews are complete i will correct all the things and finally push all the code together.

Can you push again @demoncoder-crypto ? I only see an empty .ipynb

demoncoder-crypto · 2025-04-20T15:50:03Z

Thanks for the comments and i am on the changes @miguelgfierro, i was extremely busy past 2 weeks for personal reason, I am so sorry for late reply, will start working on this and get it resolved in 2 days.

demoncoder-crypto · 2025-04-23T14:35:11Z

@miguelgfierro There are lot of errors occurring in here, some are linter errors and some I am not sure why they are occuring, Now I have tried to push 3 times to push jupyter notebook with the code, but somehow only empty notebook i sbeing pushed, If possible can I open a new pull request where I can make all of the changes and push a clean commit there?

miguelgfierro · 2025-04-23T15:28:25Z

yes, feel free to create a new PR. Please, remember to checkout from staging and then do a PR to staging. Main is our production branch and only core developers can do PRs there. Staging is for development.

demoncoder-crypto · 2025-04-23T15:31:47Z

Understood, I am really sorry for that. I will fix it and i am currently testing jupyter notebook will make changes very soon on this one, for now I am working on the other issue which I will update in 1-2 hours. Thanks for the support

demoncoder-crypto · 2025-05-02T15:16:03Z

I have implemented the changes in a new branch, closing this one

demoncoder-crypto added 2 commits April 6, 2025 22:48

feat: Implement R-Precision metric for Spark

a4a00fa

feat: Add embedding ranker model using PyTorch

f0a959a

demoncoder-crypto requested review from SimonYansenZhao, anargyri, gramhagen, loomlike, miguelgfierro and wav8k as code owners April 7, 2025 13:19

miguelgfierro reviewed Apr 8, 2025

View reviewed changes

Apply collaborator feedback and remove r_precision code

9969d75

demoncoder-crypto closed this May 2, 2025

		@@ -0,0 +1 @@
		No newline at end of file

Feature/pytorch embedding ranker #2220

Feature/pytorch embedding ranker #2220

Uh oh!

Conversation

demoncoder-crypto commented Apr 7, 2025

Uh oh!

review-notebook-app bot commented Apr 7, 2025

Uh oh!

miguelgfierro left a comment

Choose a reason for hiding this comment

Uh oh!

miguelgfierro Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

miguelgfierro Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

miguelgfierro Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

miguelgfierro Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

miguelgfierro Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

miguelgfierro Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

miguelgfierro commented Apr 8, 2025

Uh oh!

demoncoder-crypto commented Apr 8, 2025

Uh oh!

miguelgfierro commented Apr 10, 2025

Uh oh!

demoncoder-crypto commented Apr 20, 2025

Uh oh!

demoncoder-crypto commented Apr 23, 2025

Uh oh!

miguelgfierro commented Apr 23, 2025

Uh oh!

demoncoder-crypto commented Apr 23, 2025

Uh oh!

demoncoder-crypto commented May 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants