Skip to content

Issue: Missing Public Artifacts Block Full Nejumi Leaderboard Runs #346

@lhl

Description

@lhl

Hey @olachinkei - per our chat last week, I'm trying to run the Nejumi leaderboard on against our Shisa V2 models, but so far have run into a number of blockers.

I've cloned the repo into my own fork and followed along w/ the README.

I've also mirrored the artifacts to my own W&B project, eg:

uv run wandb artifact get <llm-leaderboard/nejumi-leaderboard4/<name>:<version>>
uv run python scripts/data_uploader/upload_dataset.py -e augmxnt -p nejumi-leaderboard4 \
  -n <name> -d <local path> -m "Mirrored from llm-leaderboard/nejumi-leaderboard4"

However, it looks like I've run into some blockers as these are private only:

jbbq.artifacts_path = llm-leaderboard/nejumi-leaderboard4-private/jbbq:production
toxicity.artifact_path = llm-leaderboard/nejumi-leaderboard4-private/toxicity_dataset_full:production
toxicity.judge_prompts_path = llm-leaderboard/nejumi-leaderboard4-private/toxicity_judge_prompts:production

There's another open issue #257 on one of the datasets (but there are more!) - I looked at #243 as well, but the issue isn't just permissions, it's that the datasets aren't publicly available?

I know replicability has been an issue for quite a while #43 - let me know if I should contact the support email directly or if there's someone better to coordinate w/. Happy to help validate/help getting this working with third parties, both from a documentation, dataset,and script reproducibility (eg updating the data_uploader).

Is it possible to just make the artifacts for Nejumi versioned and publicly accessible from an official nejumi4 account? That might be best from making things replicable perpective?

Also, how do model submissions look like? I'll be running on our V2 and V2.1 models and might be interested in submitting them. Some models, like our 405B are resource intensive/hard to run, so I don't mind submitting some if so.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions