Skip to content

Feature/pytorch embedding ranker #2220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

demoncoder-crypto
Copy link

Fixes #2205: [FEATURE] Add embedding ranker in PyTorch
References
TensorFlow Recommenders basic ranking example: https://www.tensorflow.org/recommenders/examples/basic_ranking
PyTorch documentation: https://pytorch.org/docs/stable/index.html
Checklist:
[x] I have followed the contribution guidelines and code style for this project.
[x] I have added tests covering my contributions (tests included in model implementation).
[x] I have updated the documentation accordingly (added docstrings).
[ ] I have signed the commits, e.g. git commit -s -m "your commit message".
[x] This PR is being made to staging branch AND NOT TO main branch.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@miguelgfierro miguelgfierro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super good, congrats for the great work @demoncoder-crypto

I have some comments

logger.info(f"Number of unique items: {self.n_items}")

# Create mapped dataframes
train_mapped_df = train_df.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this? if the data is large, this could create a large overhead?

pandas.DataFrame: Dataframe with user, item, prediction columns.
"""
# Create a copy of the test data with only the needed columns
test_copy = test_df[[col_user, col_item]].copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every copy adds a lot of overhead, let's try to avoid it

Comment on lines 406 to 413
# Generate all possible user-item pairs for prediction
user_item_pairs = []
users = []
items = []
for user in test_users:
for item in all_items:
users.append(user)
items.append(item)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double loop can be made faster with itertools:

from itertools import product
users, items = zip(*product(test_users, all_items))
users = list(users)
items = list(items)

Comment on lines 518 to 536


@pytest.mark.spark
def test_spark_r_precision(spark_data):
df_true, df_pred = spark_data

# Test perfect prediction (R-Precision should be 1.0)
evaluator_perfect = SparkRankingEvaluation(df_true, df_true, col_prediction="rating")
assert evaluator_perfect.r_precision() == pytest.approx(1.0, TOL)

# Test with sample prediction data
evaluator = SparkRankingEvaluation(df_true, df_pred)
# Expected value calculation:
# User 1: R=3 relevant items (1, 2, 3). Top 3 predictions: (1, 0.8), (5, 0.6), (2, 0.4). Relevant in top 3: (1, 2). R-Prec = 2/3
# User 2: R=2 relevant items (1, 4). Top 2 predictions: (1, 0.9), (4, 0.7). Relevant in top 2: (1, 4). R-Prec = 2/2 = 1.0
# User 3: R=1 relevant item (2). Top 1 prediction: (2, 0.7). Relevant in top 1: (2). R-Prec = 1/1 = 1.0
# Mean R-Precision = (2/3 + 1.0 + 1.0) / 3 = (0.6666... + 1 + 1) / 3 = 2.6666... / 3 = 0.8888...
expected_r_precision = (2/3 + 1.0 + 1.0) / 3
assert evaluator.r_precision() == pytest.approx(expected_r_precision, TOL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, can you add it to the other PR: #2219

@@ -356,6 +358,72 @@ def map_at_k(self):
"""
return self._metrics.meanAveragePrecisionAt(self.k)

def r_precision(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please separate this into the other PR #2219

@@ -0,0 +1 @@

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great to have the notebook accompanying this code

@miguelgfierro
Copy link
Collaborator

@anargyri @SimonYansenZhao @loomlike can you please review?

@demoncoder-crypto
Copy link
Author

Tysm @miguelgfierro. I have pushed the jupyter notebook code somehow it remained in local or stash. after all the reviews are complete i will correct all the things and finally push all the code together.

@miguelgfierro
Copy link
Collaborator

Tysm @miguelgfierro. I have pushed the jupyter notebook code somehow it remained in local or stash. after all the reviews are complete i will correct all the things and finally push all the code together.

Can you push again @demoncoder-crypto ? I only see an empty .ipynb

@demoncoder-crypto
Copy link
Author

Thanks for the comments and i am on the changes @miguelgfierro, i was extremely busy past 2 weeks for personal reason, I am so sorry for late reply, will start working on this and get it resolved in 2 days.

@demoncoder-crypto
Copy link
Author

@miguelgfierro There are lot of errors occurring in here, some are linter errors and some I am not sure why they are occuring, Now I have tried to push 3 times to push jupyter notebook with the code, but somehow only empty notebook i sbeing pushed, If possible can I open a new pull request where I can make all of the changes and push a clean commit there?

@miguelgfierro
Copy link
Collaborator

yes, feel free to create a new PR. Please, remember to checkout from staging and then do a PR to staging. Main is our production branch and only core developers can do PRs there. Staging is for development.

@demoncoder-crypto
Copy link
Author

Understood, I am really sorry for that. I will fix it and i am currently testing jupyter notebook will make changes very soon on this one, for now I am working on the other issue which I will update in 1-2 hours. Thanks for the support

@demoncoder-crypto
Copy link
Author

I have implemented the changes in a new branch, closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add embeding ranker in PyTorch
2 participants