Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure opengpt-x tasks #86

Merged
merged 41 commits into from
Jun 27, 2023
Merged

Conversation

katrinklug
Copy link

Restructering OpenGPT-X tasks to locate them in an own folder. Thus we have a better overview about the tasks and can implement upstream changes easier

@KlaudiaTH
Copy link
Collaborator

Reminder: Don't forget to move the tasks on branches that are not merged yet

ghstgs and others added 23 commits June 9, 2023 12:12
…patibility) ; --limit can now be a fraction of the dataset size
@KlaudiaTH
Copy link
Collaborator

KlaudiaTH commented Jun 20, 2023

@katrinklug
I can't merge the PR since there are build errors caused by missing dependencies (e.g. importlib-resources etc..) This needs to be fixed first. Also, one of the build checks (flake8) failed because of formatting. It is probably checked for PEP-8 formatting rules and so I still ask you to format the code accordingly (I think it should be enough to use the formatting tool black).

@KlaudiaTH
Copy link
Collaborator

@katrinklug #87
(I have implemented a patch for this.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katrinklug There seems to be inconsistencies with some of the arguments. E.g. you included no_tokenizer_check for "hf" and "gpt2" models but missed to also add it to other model types e.g. "hf-causal" or "hf-causal-experimental" or "gpt3" etc.. I have added a temporary fix in the ...thellmann1/workdir/lm_eval_setup/opengptx/lm-evaluation-harness/lm_eval/evaluator.py (from line 67), but this type of args definitely belong to the model args and need to be handled differently:

    # this is just a temporary fix
    if isinstance(model, str):
        if model_args is None:
            model_args = ""
        additional_args = {
            "batch_size": batch_size,
            "device": device,
        }
        if model in ("hf", "gpt2"):
            additional_args["no_tokenizer_check"] = no_tokenizer_check
        lm = lm_eval.models.get_model(model).create_from_arg_string(
            model_args,
            additional_args,
        )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katrinklug #87
I have implemented a patch for this. Can you please apply the patch?
You'll find it in lm_eval_setup/0001-Pass-trust_remote_code-from-model_args-to-AutoTokeni.patch on the lm_eval_setup branch.

e.g.
git am ~/path/to/evaluation/repo/lm_eval_setup/0001-Pass-trust_remote_code-from-model_args-to-AutoTokeni.patch

@KlaudiaTH KlaudiaTH merged commit 5a330e9 into OpenGPTX:master Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants