Skip to content

Commit a7da456

Browse files
Rename the conflicting environment variable LOGLEVEL to LM_EVAL_LOG_LEVEL (#3418)
* fix log level env variable name * change to `LMEVAL_LOG_LEVEL` --------- Co-authored-by: Baber <[email protected]>
1 parent c1d2747 commit a7da456

File tree

5 files changed

+19
-202
lines changed

5 files changed

+19
-202
lines changed

docs/CONTRIBUTING.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ We use [pytest](https://docs.pytest.org/en/latest/) for running unit tests. All
3333
python -m pytest --showlocals -s -vv -n=auto --ignore=tests/models/test_openvino.py
3434
```
3535

36+
## Verbose logging
37+
38+
You can enable verbose logging with the environment variable `LMEVAL_LOG_LEVEL="debug"`.
39+
3640
## Contributor License Agreement
3741

3842
We ask that new contributors agree to a Contributor License Agreement affirming that EleutherAI has the rights to use your contribution to our library.

docs/new_task_guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ and rename the folders and YAML file(s) as desired.
4141
All data downloading and management is handled through the HuggingFace (**HF**) [`datasets`](https://github.com/huggingface/datasets) API. So, the first thing you should do is check to see if your task's dataset is already provided in their catalog [here](https://huggingface.co/datasets). If it's not in there, please consider adding it to their Hub to make it accessible to a wider user base by following their [new dataset guide](https://github.com/huggingface/datasets/blob/main/ADD_NEW_DATASET.md)
4242
.
4343
> [!TIP]
44-
> To test your task, we recommend using verbose logging using `export LOGLEVEL = DEBUG` in your shell before running the evaluation script. This will help you debug any issues that may arise.
44+
> To test your task, we recommend using verbose logging using `export LMEVAL_LOG_LEVEL="DEBUG"` in your shell before running the evaluation script. This will help you debug any issues that may arise.
4545
Once you have a HuggingFace dataset prepared for your task, we want to assign our new YAML to use this dataset:
4646

4747
```yaml

examples/lm-eval-overview.ipynb

Lines changed: 12 additions & 199 deletions
Original file line numberDiff line numberDiff line change
@@ -314,61 +314,12 @@
314314
},
315315
{
316316
"cell_type": "code",
317-
"execution_count": 4,
317+
"execution_count": null,
318318
"metadata": {
319319
"id": "LOUHK7PtQfq4"
320320
},
321-
"outputs": [
322-
{
323-
"name": "stdout",
324-
"output_type": "stream",
325-
"text": [
326-
"2023-11-29:11:54:55,156 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
327-
"2023-11-29 11:54:55.942051: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
328-
"2023-11-29 11:54:55.942108: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
329-
"2023-11-29 11:54:55.942142: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
330-
"2023-11-29 11:54:57.066802: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
331-
"2023-11-29:11:55:00,954 INFO [__main__.py:132] Verbosity set to INFO\n",
332-
"2023-11-29:11:55:11,038 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
333-
"2023-11-29:11:55:11,038 INFO [__main__.py:143] Including path: ./\n",
334-
"2023-11-29:11:55:11,046 INFO [__main__.py:205] Selected Tasks: ['demo_boolq']\n",
335-
"2023-11-29:11:55:11,047 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
336-
"2023-11-29:11:55:11,110 INFO [huggingface.py:120] Using device 'cuda'\n",
337-
"config.json: 100% 571/571 [00:00<00:00, 2.87MB/s]\n",
338-
"model.safetensors: 100% 5.68G/5.68G [00:32<00:00, 173MB/s]\n",
339-
"tokenizer_config.json: 100% 396/396 [00:00<00:00, 2.06MB/s]\n",
340-
"tokenizer.json: 100% 2.11M/2.11M [00:00<00:00, 11.6MB/s]\n",
341-
"special_tokens_map.json: 100% 99.0/99.0 [00:00<00:00, 555kB/s]\n",
342-
"2023-11-29:11:56:18,658 WARNING [task.py:614] [Task: demo_boolq] metric acc is defined, but aggregation is not. using default aggregation=mean\n",
343-
"2023-11-29:11:56:18,658 WARNING [task.py:626] [Task: demo_boolq] metric acc is defined, but higher_is_better is not. using default higher_is_better=True\n",
344-
"Downloading builder script: 100% 30.7k/30.7k [00:00<00:00, 59.0MB/s]\n",
345-
"Downloading metadata: 100% 38.7k/38.7k [00:00<00:00, 651kB/s]\n",
346-
"Downloading readme: 100% 14.8k/14.8k [00:00<00:00, 37.3MB/s]\n",
347-
"Downloading data: 100% 4.12M/4.12M [00:00<00:00, 55.1MB/s]\n",
348-
"Generating train split: 100% 9427/9427 [00:00<00:00, 15630.89 examples/s]\n",
349-
"Generating validation split: 100% 3270/3270 [00:00<00:00, 20002.56 examples/s]\n",
350-
"Generating test split: 100% 3245/3245 [00:00<00:00, 20866.19 examples/s]\n",
351-
"2023-11-29:11:56:22,315 INFO [task.py:355] Building contexts for task on rank 0...\n",
352-
"2023-11-29:11:56:22,322 INFO [evaluator.py:319] Running loglikelihood requests\n",
353-
"100% 20/20 [00:04<00:00, 4.37it/s]\n",
354-
"fatal: not a git repository (or any of the parent directories): .git\n",
355-
"hf (pretrained=EleutherAI/pythia-2.8b), gen_kwargs: (), limit: 10.0, num_fewshot: None, batch_size: 1\n",
356-
"| Tasks |Version|Filter|n-shot|Metric|Value| |Stderr|\n",
357-
"|----------|-------|------|-----:|------|----:|---|-----:|\n",
358-
"|demo_boolq|Yaml |none | 0|acc | 1|± | 0|\n",
359-
"\n"
360-
]
361-
}
362-
],
363-
"source": [
364-
"%env LOGLEVEL=DEBUG\n",
365-
"!lm_eval \\\n",
366-
" --model hf \\\n",
367-
" --model_args pretrained=EleutherAI/pythia-2.8b \\\n",
368-
" --include_path ./ \\\n",
369-
" --tasks demo_boolq \\\n",
370-
" --limit 10"
371-
]
321+
"outputs": [],
322+
"source": "%env LMEVAL_LOG_LEVEL=DEBUG\n!lm_eval \\\n --model hf \\\n --model_args pretrained=EleutherAI/pythia-2.8b \\\n --include_path ./ \\\n --tasks demo_boolq \\\n --limit 10"
372323
},
373324
{
374325
"cell_type": "markdown",
@@ -415,64 +366,12 @@
415366
},
416367
{
417368
"cell_type": "code",
418-
"execution_count": 6,
369+
"execution_count": null,
419370
"metadata": {
420371
"id": "XceRKCuuDtbn"
421372
},
422-
"outputs": [
423-
{
424-
"name": "stdout",
425-
"output_type": "stream",
426-
"text": [
427-
"2023-11-29:11:56:33,016 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
428-
"2023-11-29 11:56:33.852995: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
429-
"2023-11-29 11:56:33.853050: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
430-
"2023-11-29 11:56:33.853087: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
431-
"2023-11-29 11:56:35.129047: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
432-
"2023-11-29:11:56:38,546 INFO [__main__.py:132] Verbosity set to INFO\n",
433-
"2023-11-29:11:56:47,509 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
434-
"2023-11-29:11:56:47,509 INFO [__main__.py:143] Including path: ./\n",
435-
"2023-11-29:11:56:47,517 INFO [__main__.py:205] Selected Tasks: ['yes_or_no_tasks']\n",
436-
"2023-11-29:11:56:47,520 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
437-
"2023-11-29:11:56:47,550 INFO [huggingface.py:120] Using device 'cuda'\n",
438-
"2023-11-29:11:57:08,743 WARNING [task.py:614] [Task: demo_cola] metric acc is defined, but aggregation is not. using default aggregation=mean\n",
439-
"2023-11-29:11:57:08,743 WARNING [task.py:626] [Task: demo_cola] metric acc is defined, but higher_is_better is not. using default higher_is_better=True\n",
440-
"Downloading builder script: 100% 28.8k/28.8k [00:00<00:00, 52.7MB/s]\n",
441-
"Downloading metadata: 100% 28.7k/28.7k [00:00<00:00, 51.9MB/s]\n",
442-
"Downloading readme: 100% 27.9k/27.9k [00:00<00:00, 48.0MB/s]\n",
443-
"Downloading data: 100% 377k/377k [00:00<00:00, 12.0MB/s]\n",
444-
"Generating train split: 100% 8551/8551 [00:00<00:00, 19744.58 examples/s]\n",
445-
"Generating validation split: 100% 1043/1043 [00:00<00:00, 27057.01 examples/s]\n",
446-
"Generating test split: 100% 1063/1063 [00:00<00:00, 22705.17 examples/s]\n",
447-
"2023-11-29:11:57:11,698 INFO [task.py:355] Building contexts for task on rank 0...\n",
448-
"2023-11-29:11:57:11,704 INFO [evaluator.py:319] Running loglikelihood requests\n",
449-
"100% 20/20 [00:03<00:00, 5.15it/s]\n",
450-
"fatal: not a git repository (or any of the parent directories): .git\n",
451-
"hf (pretrained=EleutherAI/pythia-2.8b), gen_kwargs: (), limit: 10.0, num_fewshot: None, batch_size: 1\n",
452-
"| Tasks |Version|Filter|n-shot|Metric|Value| |Stderr|\n",
453-
"|---------------|-------|------|-----:|------|----:|---|-----:|\n",
454-
"|yes_or_no_tasks|N/A |none | 0|acc | 0.7|± |0.1528|\n",
455-
"| - demo_cola |Yaml |none | 0|acc | 0.7|± |0.1528|\n",
456-
"\n",
457-
"| Groups |Version|Filter|n-shot|Metric|Value| |Stderr|\n",
458-
"|---------------|-------|------|-----:|------|----:|---|-----:|\n",
459-
"|yes_or_no_tasks|N/A |none | 0|acc | 0.7|± |0.1528|\n",
460-
"\n"
461-
]
462-
}
463-
],
464-
"source": [
465-
"# !accelerate launch --no_python\n",
466-
"%env LOGLEVEL=DEBUG\n",
467-
"!lm_eval \\\n",
468-
" --model hf \\\n",
469-
" --model_args pretrained=EleutherAI/pythia-2.8b \\\n",
470-
" --include_path ./ \\\n",
471-
" --tasks yes_or_no_tasks \\\n",
472-
" --limit 10 \\\n",
473-
" --output output/yes_or_no_tasks/ \\\n",
474-
" --log_samples"
475-
]
373+
"outputs": [],
374+
"source": "# !accelerate launch --no_python\n%env LMEVAL_LOG_LEVEL=DEBUG\n!lm_eval \\\n --model hf \\\n --model_args pretrained=EleutherAI/pythia-2.8b \\\n --include_path ./ \\\n --tasks yes_or_no_tasks \\\n --limit 10 \\\n --output output/yes_or_no_tasks/ \\\n --log_samples"
476375
},
477376
{
478377
"cell_type": "markdown",
@@ -520,59 +419,12 @@
520419
},
521420
{
522421
"cell_type": "code",
523-
"execution_count": 8,
422+
"execution_count": null,
524423
"metadata": {
525424
"id": "jyKOfCsKb-xy"
526425
},
527-
"outputs": [
528-
{
529-
"name": "stdout",
530-
"output_type": "stream",
531-
"text": [
532-
"2023-11-29:11:57:23,598 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
533-
"2023-11-29 11:57:24.719750: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
534-
"2023-11-29 11:57:24.719806: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
535-
"2023-11-29 11:57:24.719847: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
536-
"2023-11-29 11:57:26.656125: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
537-
"2023-11-29:11:57:31,563 INFO [__main__.py:132] Verbosity set to INFO\n",
538-
"2023-11-29:11:57:40,541 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
539-
"2023-11-29:11:57:40,541 INFO [__main__.py:143] Including path: ./\n",
540-
"2023-11-29:11:57:40,558 INFO [__main__.py:205] Selected Tasks: ['demo_mmlu_high_school_geography']\n",
541-
"2023-11-29:11:57:40,559 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
542-
"2023-11-29:11:57:40,589 INFO [huggingface.py:120] Using device 'cuda'\n",
543-
"Downloading builder script: 100% 5.84k/5.84k [00:00<00:00, 17.7MB/s]\n",
544-
"Downloading metadata: 100% 106k/106k [00:00<00:00, 892kB/s] \n",
545-
"Downloading readme: 100% 39.7k/39.7k [00:00<00:00, 631kB/s]\n",
546-
"Downloading data: 100% 166M/166M [00:01<00:00, 89.0MB/s]\n",
547-
"Generating auxiliary_train split: 100% 99842/99842 [00:07<00:00, 12536.83 examples/s]\n",
548-
"Generating test split: 100% 198/198 [00:00<00:00, 1439.20 examples/s]\n",
549-
"Generating validation split: 100% 22/22 [00:00<00:00, 4181.76 examples/s]\n",
550-
"Generating dev split: 100% 5/5 [00:00<00:00, 36.25 examples/s]\n",
551-
"2023-11-29:11:58:09,798 INFO [task.py:355] Building contexts for task on rank 0...\n",
552-
"2023-11-29:11:58:09,822 INFO [evaluator.py:319] Running loglikelihood requests\n",
553-
"100% 40/40 [00:05<00:00, 7.86it/s]\n",
554-
"fatal: not a git repository (or any of the parent directories): .git\n",
555-
"hf (pretrained=EleutherAI/pythia-2.8b), gen_kwargs: (), limit: 10.0, num_fewshot: None, batch_size: 1\n",
556-
"| Tasks |Version|Filter|n-shot| Metric |Value| |Stderr|\n",
557-
"|-------------------------------|-------|------|-----:|--------|----:|---|-----:|\n",
558-
"|demo_mmlu_high_school_geography|Yaml |none | 0|acc | 0.3|± |0.1528|\n",
559-
"| | |none | 0|acc_norm| 0.3|± |0.1528|\n",
560-
"\n"
561-
]
562-
}
563-
],
564-
"source": [
565-
"# !accelerate launch --no_python\n",
566-
"%env LOGLEVEL=DEBUG\n",
567-
"!lm_eval \\\n",
568-
" --model hf \\\n",
569-
" --model_args pretrained=EleutherAI/pythia-2.8b \\\n",
570-
" --include_path ./ \\\n",
571-
" --tasks demo_mmlu_high_school_geography \\\n",
572-
" --limit 10 \\\n",
573-
" --output output/mmlu_high_school_geography/ \\\n",
574-
" --log_samples"
575-
]
426+
"outputs": [],
427+
"source": "# !accelerate launch --no_python\n%env LMEVAL_LOG_LEVEL=DEBUG\n!lm_eval \\\n --model hf \\\n --model_args pretrained=EleutherAI/pythia-2.8b \\\n --include_path ./ \\\n --tasks demo_mmlu_high_school_geography \\\n --limit 10 \\\n --output output/mmlu_high_school_geography/ \\\n --log_samples"
576428
},
577429
{
578430
"cell_type": "markdown",
@@ -605,51 +457,12 @@
605457
},
606458
{
607459
"cell_type": "code",
608-
"execution_count": 10,
460+
"execution_count": null,
609461
"metadata": {
610462
"id": "-_CVnDirdy7j"
611463
},
612-
"outputs": [
613-
{
614-
"name": "stdout",
615-
"output_type": "stream",
616-
"text": [
617-
"2023-11-29:11:58:21,284 INFO [utils.py:160] NumExpr defaulting to 2 threads.\n",
618-
"2023-11-29 11:58:22.850159: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
619-
"2023-11-29 11:58:22.850219: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
620-
"2023-11-29 11:58:22.850254: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
621-
"2023-11-29 11:58:24.948103: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
622-
"2023-11-29:11:58:28,460 INFO [__main__.py:132] Verbosity set to INFO\n",
623-
"2023-11-29:11:58:37,935 WARNING [__main__.py:138] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.\n",
624-
"2023-11-29:11:58:37,935 INFO [__main__.py:143] Including path: ./\n",
625-
"2023-11-29:11:58:37,969 INFO [__main__.py:205] Selected Tasks: ['demo_mmlu_high_school_geography_continuation']\n",
626-
"2023-11-29:11:58:37,972 WARNING [evaluator.py:93] generation_kwargs specified through cli, these settings will be used over set parameters in yaml tasks.\n",
627-
"2023-11-29:11:58:38,008 INFO [huggingface.py:120] Using device 'cuda'\n",
628-
"2023-11-29:11:58:59,758 INFO [task.py:355] Building contexts for task on rank 0...\n",
629-
"2023-11-29:11:58:59,777 INFO [evaluator.py:319] Running loglikelihood requests\n",
630-
"100% 40/40 [00:02<00:00, 16.23it/s]\n",
631-
"fatal: not a git repository (or any of the parent directories): .git\n",
632-
"hf (pretrained=EleutherAI/pythia-2.8b), gen_kwargs: (), limit: 10.0, num_fewshot: None, batch_size: 1\n",
633-
"| Tasks |Version|Filter|n-shot| Metric |Value| |Stderr|\n",
634-
"|--------------------------------------------|-------|------|-----:|--------|----:|---|-----:|\n",
635-
"|demo_mmlu_high_school_geography_continuation|Yaml |none | 0|acc | 0.1|± |0.1000|\n",
636-
"| | |none | 0|acc_norm| 0.2|± |0.1333|\n",
637-
"\n"
638-
]
639-
}
640-
],
641-
"source": [
642-
"# !accelerate launch --no_python\n",
643-
"%env LOGLEVEL=DEBUG\n",
644-
"!lm_eval \\\n",
645-
" --model hf \\\n",
646-
" --model_args pretrained=EleutherAI/pythia-2.8b \\\n",
647-
" --include_path ./ \\\n",
648-
" --tasks demo_mmlu_high_school_geography_continuation \\\n",
649-
" --limit 10 \\\n",
650-
" --output output/mmlu_high_school_geography_continuation/ \\\n",
651-
" --log_samples"
652-
]
464+
"outputs": [],
465+
"source": "# !accelerate launch --no_python\n%env LMEVAL_LOG_LEVEL=DEBUG\n!lm_eval \\\n --model hf \\\n --model_args pretrained=EleutherAI/pythia-2.8b \\\n --include_path ./ \\\n --tasks demo_mmlu_high_school_geography_continuation \\\n --limit 10 \\\n --output output/mmlu_high_school_geography_continuation/ \\\n --log_samples"
653466
},
654467
{
655468
"cell_type": "markdown",

lm_eval/__main__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ def setup_parser() -> argparse.ArgumentParser:
231231
type=str.upper,
232232
default=None,
233233
metavar="CRITICAL|ERROR|WARNING|INFO|DEBUG",
234-
help="(Deprecated) Controls logging verbosity level. Use the `LOGLEVEL` environment variable instead. Set to DEBUG for detailed output when testing or adding new task configurations.",
234+
help="(Deprecated) Controls logging verbosity level. Use the `LMEVAL_LOG_LEVEL` environment variable instead. Set to DEBUG for detailed output when testing or adding new task configurations.",
235235
)
236236
parser.add_argument(
237237
"--wandb_args",

lm_eval/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def format(self, record):
5858
datefmt="%Y-%m-%d:%H:%M:%S",
5959
)
6060

61-
log_level = os.environ.get("LOGLEVEL", verbosity) or verbosity
61+
log_level = os.environ.get("LMEVAL_LOG_LEVEL", verbosity) or verbosity
6262

6363
level_map = {
6464
"DEBUG": logging.DEBUG,

0 commit comments

Comments
 (0)