You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On running samples I am getting this error. I want to generate code context/documentation in simple language when provided a code in java. For that is codellama better or llama?
myenv) [10:52]:[mehparmar@py029:codellama-main]$ torchrun --nproc_per_node 1 example_infilling.py \
> --ckpt_dir CodeLlama-7b/ \
> --tokenizer_path CodeLlama-7b/tokenizer.model \
> --max_seq_len 192 --max_batch_size 4
[W socket.cpp:464] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Traceback (most recent call last):
File "example_infilling.py", line 79, in <module>
fire.Fire(main)
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "example_infilling.py", line 18, in main
generator = Llama.build(
File "/vol/etl_jupyterdata1/home/github/public/Sreeramm/codellama-main/llama/generation.py", line 97, in build
assert len(checkpoints) > 0, f"no checkpoint files found in {ckpt_dir}"
AssertionError: no checkpoint files found in CodeLlama-7b/
[2024-03-16 10:54:20,433] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 75378) of binary: /home/mehparmar/.conda/envs/myenv/bin/python
Traceback (most recent call last):
File "/home/mehparmar/.conda/envs/myenv/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/mehparmar/.conda/envs/myenv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_infilling.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-03-16_10:54:20
host : py029.lvs.abc.com
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 75378)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
The text was updated successfully, but these errors were encountered:
On running samples I am getting this error. I want to generate code context/documentation in simple language when provided a code in java. For that is codellama better or llama?
The text was updated successfully, but these errors were encountered: