Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running scripts/run_epr.sh #6

Open
Tizzzzy opened this issue Jul 26, 2024 · 1 comment
Open

Error when running scripts/run_epr.sh #6

Tizzzzy opened this issue Jul 26, 2024 · 1 comment

Comments

@Tizzzzy
Copy link

Tizzzzy commented Jul 26, 2024

Hi,
I followed every instruction in the README, except for torch I have version 1.13.1. I have CUDA version 12.2, and I download torch using this command conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Then, when I run run_epr.sh I got the following error. The error is too long, and below are just part of error. If you need more detail, I would happy to provide them.

(icl) [conda] [lh599@corfu:icl-ceil]$ /research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/icl-ceil/scripts/run_epr.sh
bm25_retriever.py:94: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="configs", config_name="bm25_retriever")
[2024-07-29 11:21:17,148][__main__][INFO] - {'output_file': 'output/epr/qnli/EleutherAI/gpt-neo-2.7B/retrieved.json', 'num_candidates': 50, 'num_ice': 1, 'task_name': 'qnli', 'query_field': 'a', 'dataset_split': 'train', 'ds_size': 44000, 'index_reader': {'_target_': 'src.dataset_readers.index_dsr.IndexDatasetReader', 'task_name': '${task_name}', 'model_name': 'bert-base-uncased', 'field': 'a', 'dataset_split': 'train', 'dataset_path': 'index_data/qnli/index_dataset.json', 'ds_size': None}}
Downloading and preparing dataset None/ax to /research/cbim/vast/lh599/.cache/huggingface/datasets/parquet/ax-2087346d15759eef/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...
Downloading data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 347.79it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 33.06it/s]
Generating train split: 0 examples [00:00, ? examples/s][2024-07-29 11:21:19,127][datasets.packaged_modules.parquet.parquet][ERROR] - Failed to read file '/research/cbim/vast/lh599/.cache/huggingface/datasets/downloads/34b0e169567d6cc580fb18f6593c0f0d3d4ade7ca85314eaa32071da0c48b253' with error <class 'ValueError'>: Couldn't cast
sentence: string
label: int64
idx: int32
-- schema metadata --
huggingface: '{"info": {"features": {"sentence": {"dtype": "string", "_ty' + 136
to
{'premise': Value(dtype='string', id=None), 'hypothesis': Value(dtype='string', id=None), 'label': ClassLabel(names=['entailment', 'neutral', 'contradiction'], id=None), 'idx': Value(dtype='int32', id=None)}
because column names don't match
Error executing job with overrides: ['output_file=output/epr/qnli/EleutherAI/gpt-neo-2.7B/retrieved.json', 'num_candidates=50', 'num_ice=1', 'task_name=qnli', 'index_reader.dataset_path=index_data/qnli/index_dataset.json', 'dataset_split=train', 'ds_size=44000', 'query_field=a', 'index_reader.field=a']
Traceback (most recent call last):
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/builder.py", line 1879, in _prepare_split_single
    for _, table in generator:
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/packaged_modules/parquet/parquet.py", line 82, in _generate_tables
    yield f"{file_idx}_{batch_idx}", self._cast_table(pa_table)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/packaged_modules/parquet/parquet.py", line 61, in _cast_table
    pa_table = table_cast(pa_table, self.info.features.arrow_schema)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/table.py", line 2324, in table_cast
    return cast_table_to_schema(table, schema)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/table.py", line 2282, in cast_table_to_schema
    raise ValueError(f"Couldn't cast\n{table.schema}\nto\n{features}\nbecause column names don't match")
ValueError: Couldn't cast
sentence: string
label: int64
idx: int32
-- schema metadata --
huggingface: '{"info": {"features": {"sentence": {"dtype": "string", "_ty' + 136
to
{'premise': Value(dtype='string', id=None), 'hypothesis': Value(dtype='string', id=None), 'label': ClassLabel(names=['entailment', 'neutral', 'contradiction'], id=None), 'idx': Value(dtype='int32', id=None)}
because column names don't match

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 92, in _call_target
    return _target_(*args, **kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/icl-ceil/src/dataset_readers/index_dsr.py", line 33, in __init__
    super().__init__(**kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/icl-ceil/src/dataset_readers/base_dsr.py", line 40, in __init__
    self.init_dataset(task_name, field, dataset_path, dataset_split, ds_size)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/icl-ceil/src/dataset_readers/base_dsr.py", line 46, in init_dataset
    ds_size=ds_size)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/icl-ceil/src/dataset_readers/dataset_wrappers/__init__.py", line 5, in get_dataset_wrapper
    return importlib.import_module('src.dataset_readers.dataset_wrappers.{}'.format(name)).DatasetWrapper(**kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/icl-ceil/src/dataset_readers/dataset_wrappers/base_dsw.py", line 25, in __init__
    self.dataset = load_dataset(self.hf_dataset, self.hf_dataset_name)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/load.py", line 1815, in load_dataset
    storage_options=storage_options,
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/builder.py", line 913, in download_and_prepare
    **download_and_prepare_kwargs,
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/builder.py", line 1004, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/builder.py", line 1768, in _prepare_split
    gen_kwargs=gen_kwargs, job_id=job_id, **_prepare_split_args
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/datasets/builder.py", line 1912, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "bm25_retriever.py", line 102, in <module>
    main()
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/main.py", line 99, in decorated_main
    config_name=config_name,
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/utils.py", line 401, in _run_hydra
    overrides=overrides,
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/utils.py", line 458, in _run_app
    lambda: hydra.run(
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/utils.py", line 461, in <lambda>
    overrides=overrides,
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "bm25_retriever.py", line 98, in main
    find(cfg)
  File "bm25_retriever.py", line 67, in find
    knn_finder = BM25Finder(cfg)
  File "bm25_retriever.py", line 23, in __init__
    self.index_dataset = hu.instantiate(cfg.index_reader).dataset_wrapper
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 227, in instantiate
    config, *args, recursive=_recursive_, convert=_convert_, partial=_partial_
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 347, in instantiate_node
    return _call_target(_target_, partial, args, kwargs, full_key)
  File "/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 97, in _call_target
    raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error in call to target 'src.dataset_readers.index_dsr.IndexDatasetReader':
DatasetGenerationError('An error occurred while generating the dataset')
full_key: index_reader
[11:21:20] WARNING  The following values were not passed to `accelerate launch` and had defaults used instead:                                           launch.py:913
                            `--num_machines` was set to a value of `1`
                            `--mixed_precision` was set to a value of `'no'`
                            `--dynamo_backend` was set to a value of `'no'`
                    To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/research/cbim/medical/lh599/research/ruijiang/Dong/demonstration_selection/icl-ceil/inferencer.py:158: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="configs", config_name="inferencer")
scorer.py:108: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="configs", config_name="scorer")
/research/cbim/medical/lh599/research/ruijiang/miniconda/envs/icl/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'scorer': Defaults list is missing `_self_`. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
[2024-07-29 11:21:23,372][__main__][INFO] - {'model_config': {'model_type': 'hf', 'model': {'_target_': 'transformers.AutoModelForCausalLM.from_pretrained', 'pretrained_model_name_or_path': '${model_name}'}, 'generation_kwargs': {'temperature': 0, 'max_new_tokens': 300}}, 'model_name': 'EleutherAI/gpt-neo-2.7B', 'task_name': 'qnli', 'output_file': 'output/epr/qnli/EleutherAI/gpt-neo-2.7B/scored.json', 'batch_size': 8, 'dataset_reader': {'_target_': 'src.dataset_readers.scoring_dsr.ScorerDatasetReader', 'dataset_path': 'output/epr/qnli/EleutherAI/gpt-neo-2.7B/retrieved.json', 'dataset_split': None, 'ds_size': None, 'task_name': '${task_name}', 'model_name': '${model_name}', 'n_tokens': 1600, 'field': 'gen_a', 'index_reader': '${index_reader}'}, 'index_reader': {'_target_': 'src.dataset_readers.index_dsr.IndexDatasetReader', 'task_name': '${task_name}', 'model_name': '${model_name}', 'field': 'qa', 'dataset_path': 'index_data/qnli/index_dataset.json', 'dataset_split': None, 'ds_size': None}}
Downloading and preparing dataset None/ax to /research/cbim/vast/lh599/.cache/huggingface/datasets/parquet/ax-2087346d15759eef/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...
Downloading data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 353.69it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 32.58it/s]
Generating train split: 0 examples [00:00, ? examples/s][2024-07-29 11:21:25,090][datasets.packaged_modules.parquet.parquet][ERROR] - Failed to read file '/research/cbim/vast/lh599/.cache/huggingface/datasets/downloads/34b0e169567d6cc580fb18f6593c0f0d3d4ade7ca85314eaa32071da0c48b253' with error <class 'ValueError'>: Couldn't cast
sentence: string
label: int64
idx: int32
-- schema metadata --
huggingface: '{"info": {"features": {"sentence": {"dtype": "string", "_ty' + 136
to
{'premise': Value(dtype='string', id=None), 'hypothesis': Value(dtype='string', id=None), 'label': ClassLabel(names=['entailment', 'neutral', 'contradiction'], id=None), 'idx': Value(dtype='int32', id=None)}
because column names don't match
Error executing job with overrides: ['task_name=qnli', 'output_file=output/epr/qnli/EleutherAI/gpt-neo-2.7B/scored.json', 'batch_size=8', 'model_name=EleutherAI/gpt-neo-2.7B', 'dataset_reader.dataset_path=output/epr/qnli/EleutherAI/gpt-neo-2.7B/retrieved.json', 'dataset_reader.n_tokens=1600', 'index_reader.dataset_path=index_data/qnli/index_dataset.json']

Can you please take a look?

@jiacheng-ye
Copy link
Contributor

Hi, sorry for the late reply. It seems to be a data loader issue. You can check whether the qnli dataset is correctly downloaded from huggingface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants