You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded TableLlama and tested it on the provided dataset. However, the model's performance on out-of-domain datasets is subpar. For example:
On the FEVEROUS dataset, TableLlama consistently outputs refuted and entailed even though the instructions clearly state that the output should be not enough infosupports or refutes.
On the ToTTo dataset, where the input is in HTML format, TableLlama will generate outputs with strange format like 2018\u201319 JGP Final Junior <col_header> Level </col_header> 1 126.26 <col_header> FS </col_header> 1 190.63 <col_header> Total </col_header>
, which does not match the format of ground truth: At the 2018\u201319 JGP in junior-level, in the Final event, Mishina and Galliamov had a combined total of 190.63 points and a free program of 126.26 points.
Is there any preprocessing required before evaluating the model on these out-of-domain datasets?
The text was updated successfully, but these errors were encountered:
For FEVEROUS, we map the 'refuted' to 'refutes' and map the 'entailed' to 'supports' for evaluation. But you are correct currently TableLlama can't handle 'not enough info' cases.
For ToTTo, the TableLlama's prediction should be as following if you correctly use the prompt:
"instruction": "This is a highlighted cells description task. The goal of this task is to generate the language description given table cells.",
"input_seg": "<page_title> Aleksandr Galliamov </page_title> <section_title> With Mishina </section_title> <table> <cell> 2018\u201319 JGP Final <col_header> Event </col_header> </cell> <cell> Junior <col_header> Level </col_header> </cell> <cell> 1 126.26 <col_header> FS </col_header> </cell> <cell> 1 190.63 <col_header> Total </col_header> </cell> </table>",
"question": "Please generate one natural language description to describe the given highlighted table cells.",
"output": "At the 2018\u201319 JGP in junior-level, in the Final event, Mishina and Galliamov had a combined total of 190.63 points and free program of 126.26 points.",
"predict": "Aleksandr Galliamov won the Junior Grand Prix Final with a total score of 1 190.63.</s>"
}
I downloaded TableLlama and tested it on the provided dataset. However, the model's performance on out-of-domain datasets is subpar. For example:
On the FEVEROUS dataset, TableLlama consistently outputs
refuted
andentailed
even though the instructions clearly state that the output should benot enough info
supports
orrefutes
.On the ToTTo dataset, where the input is in HTML format, TableLlama will generate outputs with strange format like
2018\u201319 JGP Final Junior <col_header> Level </col_header> 1 126.26 <col_header> FS </col_header> 1 190.63 <col_header> Total </col_header>
, which does not match the format of ground truth:
At the 2018\u201319 JGP in junior-level, in the Final event, Mishina and Galliamov had a combined total of 190.63 points and a free program of 126.26 points.
Is there any preprocessing required before evaluating the model on these out-of-domain datasets?
The text was updated successfully, but these errors were encountered: