-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Feature/pytorch embedding ranker v4 #2228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Feature/pytorch embedding ranker v4 #2228
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,10 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if there was an error in the submission of the notebook, but I can't see it
# Licensed under the MIT License. | ||
|
||
import numpy as np | ||
import warnings # Added for R-Precision warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is still from the other PR, it needs to be removed
# Copyright (c) Recommenders contributors. | ||
# Licensed under the MIT License. | ||
|
||
import os | ||
import numpy as np | ||
import pandas as pd | ||
import torch | ||
|
||
from recommenders.utils.constants import ( | ||
DEFAULT_USER_COL, | ||
DEFAULT_ITEM_COL, | ||
DEFAULT_RATING_COL, | ||
DEFAULT_PREDICTION_COL, | ||
DEFAULT_K, | ||
) | ||
|
||
def predict_rating( | ||
model, | ||
test_df, | ||
col_user=DEFAULT_USER_COL, | ||
col_item=DEFAULT_ITEM_COL, | ||
col_rating=DEFAULT_RATING_COL, | ||
col_prediction=DEFAULT_PREDICTION_COL, | ||
batch_size=1024, | ||
): | ||
"""Predict ratings for user-item pairs in test data. | ||
Args: | ||
model (NNEmbeddingRanker): Trained embedding ranker model. | ||
test_df (pandas.DataFrame): Test dataframe containing user-item pairs. | ||
col_user (str): User column name. | ||
col_item (str): Item column name. | ||
col_rating (str): Rating column name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all this can become the notebook.
Some thoughts for the notebook:
- Here is a good example of a useful notebook: https://github.com/recommenders-team/recommenders/blob/main/examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb it explains both the math behind it and an implementation
- Think of the notebook as a way to showcase how to use the ranker and what is the ranker about.
- The objective of the notebook is that it needs to be useful. That's the most important metric.
- Ideally, a person could go to this notebook, add their data, run it, and understand how to use it.
- For the notebook, you can just showcase the content of these functions directly inside the notebook.
- Something very important is that we follow the principle of explicit is better than implicit. For example, we don't something like
for metric_name, metric_func in metrics.items()
, because it adds a layer of complexity. Instead we show the metrics explicitely:rmse(true, pred)
,precision_at_k(true, pred, params)
, etc. Each person with a quick view can see what is going on. - Feel free to come to our Monday meeting if you want to understand better how we do the notebooks @demoncoder-crypto
self.col_rating = col_rating | ||
self.col_prediction = col_prediction | ||
self.threshold = threshold | ||
self.rating_pred_raw = rating_pred # Store raw predictions before processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rating_pred
is already stored in self.rating_pred
, and self.rating_pred_raw
is not used in this class. Any reason you want to store?
introducing serendipity into music recommendation, WSDM 2012 | ||
Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, | ||
Eugene Yan, Serendipity's unpopular best friend in Recommender Systems, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original reference sentence is correct. https://eugeneyan.com/writing/serendipity-and-accuracy-in-recommender-systems/
all_pairs = [] | ||
for user in valid_users: | ||
for item in all_items: | ||
all_pairs.append((user, item)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_pairs = [(u, i) for u in valid_users for i in all_items]
or
from itertools import product
all_pairs = list(product(valid_users, all_items))
# Filter out seen pairs | ||
result_df = result_df[~result_df.apply(lambda row: (row[col_user], row[col_item]) in seen_pairs, axis=1)] | ||
|
||
# Get top-k recommendations for each user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reuse predict_rating
if generating_recommendation is using the same logic under the hood and sort & cut top_k in the end?
# Calculate metrics | ||
results = {} | ||
for metric_name, metric_func in metrics.items(): | ||
# Different metrics may have different required parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the only difference is k
, you may:
results[metric_name] = metric_func(
test_df,
predictions_df,
col_user=col_user,
col_item=col_item,
col_rating=col_rating,
col_prediction=col_prediction,
k=k if 'k' in metric_func.__code__.co_varnames else None,
)
@setuc FYI |
@demoncoder-crypto how is this work going? |
@jmarrietar do you think you would be able to take over this work? It is very similar to embdotbias |
Hi, @miguelgfierro . I'll be on a tight schedule for the following months. But I can take a look when I free up a little bit 😄 . |
Description
Related Issues
References
Checklist:
git commit -s -m "your commit message"
.staging branch
AND NOT TOmain branch
.