[RMP] Cross-framework model evaluation metrics

# Problem statement
Merlin models should allow you to train models from different libraries (TF/PT/lightfm/xgboost/implicit etc) and then compare them in a consistent "apples-to-apples" type way. Right now each library has its own evaluation metrics, that aren't directly comparable to other frameworks. During the paper experiments for instance, we had to implement custom metric code for implicit to match the output from TF.  Customers also want to be able to compare to their own metrics.

# Goals
- Customers are able to compare models across libraries.  
- Coverage of metrics: Precision, NDCG, MAP, AUC
- Support for both retrieval and ranking models
- Only used during final evaluation process.  This isn't used during training. (Non goal) 
- Ability to slice on key features from user/item
- Example 

# Constraints
- Scoring should happen for all items, not just negative samples, in order to be able to compare correctly for ranking metric
- Only one implementation of these metrics so that the calculations are consistent.  This implies conversion.
- We want to be able to compare across multiple frameworks.
- Pre-requisites:
- https://github.com/NVIDIA-Merlin/Merlin/issues/419
- Support for external libraries and systems level evaluation

# Starting Point
- Input is an ordered list of recommendations that gets scored/sliced
- Evaluation is from a list of prediction
- cuDF implementation
- Integration with visualization tooling.  (marc to add options)

# Notes 
- Gabriel has a POC of evaluation framework that takes a cudf dataframe with predictions and computes using cupy popular top-k metrics (recall, precision, mrr, map, ndcg) - code: https://github.com/rapidsai/recsys/tree/main/benchmark_recsys/code/evaluation 
- Marc built a system like this at Spotify which converted to specific framework metrics

# Tasks
( Ground work for cross framework evaluation )

- [ ] Create a mechanism for transferring data across frameworks
  - [ ] https://github.com/NVIDIA-Merlin/core/pull/102 
  - [ ] Design document
  - [ ] PoC for Cross framework model evaluation metrics 
  - [ ] Prepare presentation ( TBC ) and collect feedback from team

( Enter the goal here )

- [ ] Establish a standard set of metrics (to be used in conjunction with cross-framework data transfer)
- [ ] [Create example for cross framework evaluation metrics](https://github.com/NVIDIA-Merlin/models/issues/905)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RMP] Cross-framework model evaluation metrics #407

Problem statement

Goals

Constraints

Starting Point

Notes

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RMP] Cross-framework model evaluation metrics #407

Description

Problem statement

Goals

Constraints

Starting Point

Notes

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions