Evaluation Metrics seem to be over-counting/ inflating the counts of true positives?

Hi Kevin,

Thanks for your contribution to the ABSA task. I just wanted to bring your attention to the following code block in your utils.py file within the InstructABSA folder. Seems that because each matched prediction isn't removed from the pred_val list, in the case where gt_val contains repeated instances ['food', 'food'], but pred_val contains only one instance ['food'], the model is considered to predict all instances correctly, despite missing out the second 'food'? 

> 
      def get_metrics(self, y_true, y_pred, is_triplet_extraction=False):
        total_pred = 0
        total_gt = 0
        tp = 0
        if not is_triplet_extraction:
            for gt, pred in zip(y_true, y_pred):
                gt_list = gt.split(', ')
                pred_list = pred.split(', ')
                total_pred+=len(pred_list)
                total_gt+=len(gt_list)
                for gt_val in gt_list:
                    for pred_val in pred_list:
                        if pred_val in gt_val or gt_val in pred_val:
                            tp+=1
                            break

        else:
            for gt, pred in zip(y_true, y_pred):
                gt_list = gt.split(', ')
                pred_list = pred.split(', ')
                total_pred+=len(pred_list)
                total_gt+=len(gt_list)
                for gt_val in gt_list:
                    gt_asp = gt_val.split(':')[0]

                    try:
                        gt_op = gt_val.split(':')[1]
                    except:
                        continue

                    try:
                        gt_sent = gt_val.split(':')[2]
                    except:
                        continue

                    for pred_val in pred_list:
                        pr_asp = pred_val.split(':')[0]

                        try:
                            pr_op = pred_val.split(':')[1]
                        except:
                            continue



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluation Metrics seem to be over-counting/ inflating the counts of true positives? #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions