Skip to content

Evaluation Dataset in "Response Generation" - "end-to-end models" part  #130

@nqchieutb01

Description

@nqchieutb01

Is this part used multiwoz 2.2 or multiwoz 2.0 as a benchmark dataset.
I'm so confused, in RewardNet, Mars, KRLS original paper, all results are the same as your table, but they all reported in multiwoz 2.0 dataset. Morever, in the TOATOD paper, authors reported combined score in multiwoz 2.2 dataset.
Is there any mistakes. Can you explain this inconsistent.
Thanks !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions