You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the latest contribution "Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction", having manual test significantly improves our understanding of the DRE models. I have few questions re the paper's experiments:
Q1: Is it possible to provide the pre-trained checkpoints for BERT+sent/bag+AVG models?
Q2: Regarding evaluation, it is mentioned in paper:
Bag-level manual evaluation: We take our
human-labeled test data for bag-level evaluation.
Since annotated data are at the sentence-level, we
construct bag-level annotations in the following
way: For each bag, if one sentence in the bag has
a human-labeled relation, this bag is labeled with
this relation; if no sentence in the bag is annotated
with any relation, this bag is labeled as N/A.
Can you elaborate this further, is this same as in current eval part of BagRELoader code? Unfortunately, I cannot find 'anno_relation_list' in the manually created test set, does this require additional pre-processing?
Q3: At evaluation (valid, test) time, the bag_size parameter should be set to 0 (so we consider all sentences in the Bag as also reported in paper -- but this is not handled in current BagRE framework) and entpair_as_bag to True?
Q4: Can you provide the scores for the NYT10m val set for the models reported in Table 4 of the paper? Do you also plan to provide P@k metrics and pr_curves for the models reported in Table 4?
Q5: Is BERT+sent level training performed with MultiLabelSentenceRE or simple SentenceRE?
Thank you in advance!
The text was updated successfully, but these errors were encountered:
@gaotianyu1350 Thanks for great work. I have same questions. @suamin Did you find the answers for your questions? As for NYT10m, I trained BERT with sentence level framework, and then test it by using bag level framework and multi label separately. The results shows that test with bag-level (60.6, 35.32) is better than multi label (58.39, 31.98). However, I still cannot reproduce the result on the paper.
@HenryPaik1 thanks for your input. I've not been able to find answers to the questions. I still struggle to reproduce paper numbers. For BERT+sent+AVG, I get AUC=55.45, macro-F1=21.12 on val and AUC=47.49, macro-F1=11.23 on test with Bag-level evaluation.
Hi,
Thank you for the latest contribution "Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction", having manual test significantly improves our understanding of the DRE models. I have few questions re the paper's experiments:
Q1: Is it possible to provide the pre-trained checkpoints for BERT+sent/bag+AVG models?
Q2: Regarding evaluation, it is mentioned in paper:
Can you elaborate this further, is this same as in current eval part of BagRELoader code? Unfortunately, I cannot find
'anno_relation_list'
in the manually created test set, does this require additional pre-processing?Q3: At evaluation (valid, test) time, the
bag_size
parameter should be set to0
(so we consider all sentences in the Bag as also reported in paper -- but this is not handled in current BagRE framework) andentpair_as_bag
toTrue
?Q4: Can you provide the scores for the NYT10m
val
set for the models reported in Table 4 of the paper? Do you also plan to provide P@k metrics and pr_curves for the models reported in Table 4?Q5: Is BERT+sent level training performed with
MultiLabelSentenceRE
or simpleSentenceRE
?Thank you in advance!
The text was updated successfully, but these errors were encountered: