-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Feature/pytorch embedding ranker #2220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/pytorch embedding ranker #2220
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super good, congrats for the great work @demoncoder-crypto
I have some comments
logger.info(f"Number of unique items: {self.n_items}") | ||
|
||
# Create mapped dataframes | ||
train_mapped_df = train_df.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this? if the data is large, this could create a large overhead?
pandas.DataFrame: Dataframe with user, item, prediction columns. | ||
""" | ||
# Create a copy of the test data with only the needed columns | ||
test_copy = test_df[[col_user, col_item]].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every copy adds a lot of overhead, let's try to avoid it
# Generate all possible user-item pairs for prediction | ||
user_item_pairs = [] | ||
users = [] | ||
items = [] | ||
for user in test_users: | ||
for item in all_items: | ||
users.append(user) | ||
items.append(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double loop can be made faster with itertools:
from itertools import product
users, items = zip(*product(test_users, all_items))
users = list(users)
items = list(items)
|
||
|
||
@pytest.mark.spark | ||
def test_spark_r_precision(spark_data): | ||
df_true, df_pred = spark_data | ||
|
||
# Test perfect prediction (R-Precision should be 1.0) | ||
evaluator_perfect = SparkRankingEvaluation(df_true, df_true, col_prediction="rating") | ||
assert evaluator_perfect.r_precision() == pytest.approx(1.0, TOL) | ||
|
||
# Test with sample prediction data | ||
evaluator = SparkRankingEvaluation(df_true, df_pred) | ||
# Expected value calculation: | ||
# User 1: R=3 relevant items (1, 2, 3). Top 3 predictions: (1, 0.8), (5, 0.6), (2, 0.4). Relevant in top 3: (1, 2). R-Prec = 2/3 | ||
# User 2: R=2 relevant items (1, 4). Top 2 predictions: (1, 0.9), (4, 0.7). Relevant in top 2: (1, 4). R-Prec = 2/2 = 1.0 | ||
# User 3: R=1 relevant item (2). Top 1 prediction: (2, 0.7). Relevant in top 1: (2). R-Prec = 1/1 = 1.0 | ||
# Mean R-Precision = (2/3 + 1.0 + 1.0) / 3 = (0.6666... + 1 + 1) / 3 = 2.6666... / 3 = 0.8888... | ||
expected_r_precision = (2/3 + 1.0 + 1.0) / 3 | ||
assert evaluator.r_precision() == pytest.approx(expected_r_precision, TOL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, can you add it to the other PR: #2219
@@ -356,6 +358,72 @@ def map_at_k(self): | |||
""" | |||
return self._metrics.meanAveragePrecisionAt(self.k) | |||
|
|||
def r_precision(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please separate this into the other PR #2219
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great to have the notebook accompanying this code
@anargyri @SimonYansenZhao @loomlike can you please review? |
Tysm @miguelgfierro. I have pushed the jupyter notebook code somehow it remained in local or stash. after all the reviews are complete i will correct all the things and finally push all the code together. |
Can you push again @demoncoder-crypto ? I only see an empty |
Thanks for the comments and i am on the changes @miguelgfierro, i was extremely busy past 2 weeks for personal reason, I am so sorry for late reply, will start working on this and get it resolved in 2 days. |
@miguelgfierro There are lot of errors occurring in here, some are linter errors and some I am not sure why they are occuring, Now I have tried to push 3 times to push jupyter notebook with the code, but somehow only empty notebook i sbeing pushed, If possible can I open a new pull request where I can make all of the changes and push a clean commit there? |
yes, feel free to create a new PR. Please, remember to checkout from staging and then do a PR to staging. Main is our production branch and only core developers can do PRs there. Staging is for development. |
Understood, I am really sorry for that. I will fix it and i am currently testing jupyter notebook will make changes very soon on this one, for now I am working on the other issue which I will update in 1-2 hours. Thanks for the support |
I have implemented the changes in a new branch, closing this one |
Fixes #2205: [FEATURE] Add embedding ranker in PyTorch
References
TensorFlow Recommenders basic ranking example: https://www.tensorflow.org/recommenders/examples/basic_ranking
PyTorch documentation: https://pytorch.org/docs/stable/index.html
Checklist:
[x] I have followed the contribution guidelines and code style for this project.
[x] I have added tests covering my contributions (tests included in model implementation).
[x] I have updated the documentation accordingly (added docstrings).
[ ] I have signed the commits, e.g. git commit -s -m "your commit message".
[x] This PR is being made to staging branch AND NOT TO main branch.