ValueError: Processor was not found, please check and update your model file. #8724
Unanswered
Liam-Merouin
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
[rank0]: multiprocess.pool.RemoteTraceback:
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
[rank0]: result = (True, func(*args, **kwds))
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 680, in _write_generator_to_queue
[rank0]: for i, result in enumerate(func(**kwargs)):
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3516, in _map_single
[rank0]: for i, batch in iter_outputs(shard_iterable):
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3466, in iter_outputs
[rank0]: yield i, apply_function(example, i, offset=offset)
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3389, in apply_function
[rank0]: processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/data/processor/supervised.py", line 99, in preprocess_dataset
[rank0]: input_ids, labels = self._encode_data_example(
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/data/processor/supervised.py", line 43, in _encode_data_example
[rank0]: messages = self.template.mm_plugin.process_messages(prompt + response, images, videos, audios, self.processor)
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/data/mm_plugin.py", line 637, in process_messages
[rank0]: self._validate_input(processor, images, videos, audios)
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/data/mm_plugin.py", line 176, in _validate_input
[rank0]: raise ValueError("Processor was not found, please check and update your model file.")
[rank0]: ValueError: Processor was not found, please check and update your model file.
[rank0]: """
[rank0]: The above exception was the direct cause of the following exception:
[rank0]: Traceback (most recent call last):
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/launcher.py", line 23, in
[rank0]: launch()
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/train/tuner.py", line 110, in run_exp
[rank0]: _training_function(config={"args": args, "callbacks": callbacks})
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/train/tuner.py", line 72, in _training_function
[rank0]: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/train/sft/workflow.py", line 51, in run_sft
[rank0]: dataset_module = get_dataset(template, model_args, data_args, training_args, stage="sft", **tokenizer_module)
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/data/loader.py", line 315, in get_dataset
[rank0]: dataset = _get_preprocessed_dataset(
[rank0]: File "/export/App/training_platform/PinoModel/src/llamafactory/data/loader.py", line 256, in _get_preprocessed_dataset
[rank0]: dataset = dataset.map(
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
[rank0]: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3166, in map
[rank0]: for rank, done, content in iflatmap_unordered(
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 720, in iflatmap_unordered
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 720, in
[rank0]: [async_result.get(timeout=0.05) for async_result in async_results]
[rank0]: File "/usr/local/miniconda3/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
[rank0]: raise self._value
[rank0]: ValueError: Processor was not found, please check and update your model file.
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s][rank0]:[W723 18:01:58.518254603 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
mayulin-liam-ba8f202c-master-0:1126:1693 [0] NCCL INFO [Service thread] Connection closed by localRank 0
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]mayulin-liam-ba8f202c-master-0:1126:1892 [0] NCCL INFO comm 0xfdfc7f0 rank 0 nranks 4 cudaDev 0 busId 9a000 - Abort COMPLETE
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:00<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]
Running tokenizer on dataset (num_proc=16): 0%| | 0/5101 [00:01<?, ? examples/s]W0723 18:02:00.439000 1061 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1127 closing signal SIGTERM
W0723 18:02:00.439000 1061 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1128 closing signal SIGTERM
W0723 18:02:00.440000 1061 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1129 closing signal SIGTERM
E0723 18:02:02.858000 1061 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 0 (pid: 1126) of binary: /usr/local/miniconda3/bin/python
Traceback (most recent call last):
File "/usr/local/miniconda3/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/miniconda3/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
File "/usr/local/miniconda3/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
run(args)
File "/usr/local/miniconda3/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
elastic_launch(
File "/usr/local/miniconda3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/miniconda3/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/export/App/training_platform/PinoModel/src/llamafactory/launcher.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2025-07-23_18:02:00
host : mayulin-liam-ba8f202c-master-0
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 1126)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Traceback (most recent call last):
File "/usr/local/miniconda3/bin/llamafactory-cli", line 8, in
sys.exit(main())
File "/export/App/training_platform/PinoModel/src/llamafactory/cli.py", line 130, in main
process = subprocess.run(
File "/usr/local/miniconda3/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['torchrun', '--nnodes', '1', '--node_rank', '0', '--nproc_per_node', '4', '--master_addr', 'localhost', '--master_port', '23456', '/export/App/training_platform/PinoModel/src/llamafactory/launcher.py', 'examples/train_lora/intern3vl_lora_sft_78B_cate_0723.yaml']' returned non-zero exit status 1.
/export/App/training_platform/PinoModel/.kube/mayulin-liam-ba8f202c/.mayulin-liam-ba8f202c-master-0.sh: line 12: /bin/hadoop: No such file or directory
world_size: 1 rank: 0 dist_url: None
master addr: localhost, master port 23456
[W723 18:02:07.150707478 socket.cpp:759] [c10d] The client socket has failed to connect to [localhost]:23456 (errno: 99 - Cannot assign requested address).
[W723 18:02:07.150822602 socket.cpp:759] [c10d] The client socket has failed to connect to [localhost]:23456 (errno: 99 - Cannot assign requested address).
[W723 18:02:07.152449280 socket.cpp:759] [c10d] The client socket has failed to connect to [localhost]:23456 (errno: 99 - Cannot assign requested address).
[1]Running basic DDP example on rank 0-1/4
[3]Running basic DDP example on rank 0-3/4
[2]Running basic DDP example on rank 0-2/4
[0]Running basic DDP example on rank 0-0/4
/export/App/training_platform/PinoModel/.kube/mayulin-liam-ba8f202c/.mayulin-liam-ba8f202c-master-0.sh: line 3: kill: (2030) - No such process
大家在使用LLamafactory框架微调InternVL3-78B的时候有遇到这个问题嘛?怎么解决的?
Beta Was this translation helpful? Give feedback.
All reactions