What models are used when GPT-4-1106-preview is inaccessible?

Hi everyone,

I’m new to NLP and currently reviewing papers like SimPO that use AlpacaEval2 for evaluation. I have two questions:

1. Is GPT-4-1106-preview the default judge model in AlpacaEval2?
 Many recent papers (e.g., SimPO) seem to rely on GPT-4 for evaluation. Is it specifically the gpt-4-1106-preview version, or another variant?

2. If GPT-4-1106-preview is unavailable, what are the alternatives?
For fairness and reproducibility, what models do researchers typically use instead? 

Would appreciate any insights or references to papers addressing this! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What models are used when GPT-4-1106-preview is inaccessible? #452

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What models are used when GPT-4-1106-preview is inaccessible? #452

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions