RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[1024, 1, 224, 224] to have 3 channels, but got 1 channels instead

which file i need to change to solve this issue . Iam working with video-llava but I think this is a issue across all llava model . can admin suggest me where should I look into to solve this mismatch issue :

here is the  error message :

```
Adding LoRA adapters...
total data 10988
Formatting inputs...Skip in lazy mode
  0%|                                                                                                  | 0/86 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/hmbadal/AQA/ABC/Video-LLaVA/videollava/train/train_mem.py", line 13, in <module>
    train()
  File "/home/hmbadal/AQA/ABC/Video-LLaVA/videollava/train/train.py", line 1078, in train
    trainer.train()
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss
    outputs = model(**inputs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
    output.reraise()
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
    return self.base_model(
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hmbadal/AQA/ABC/Video-LLaVA/videollava/model/language_model/llava_llama.py", line 79, in forward
    ) = self.prepare_inputs_labels_for_multimodal(
  File "/home/hmbadal/AQA/ABC/Video-LLaVA/videollava/model/llava_arch.py", line 207, in prepare_inputs_labels_for_multimodal
    video_features_minibatch = self.encode_videos(videos_minibatch)  # fake list [mini_b, t, l, c]
  File "/home/hmbadal/AQA/ABC/Video-LLaVA/videollava/model/llava_arch.py", line 144, in encode_videos
    video_features = self.get_model().get_video_tower()(videos)  # [mini_b, t, n, c]
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/hmbadal/AQA/ABC/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/__init__.py", line 227, in forward
    video_forward_outs = self.video_tower(videos.to(device=self.device, dtype=self.dtype), output_hidden_states=True)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hmbadal/AQA/ABC/Video-LLaVA/videollava/model/multimodal_encoder/languagebind/video/modeling_video.py", line 646, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/hmbadal/anaconda3/envs/badalbhai/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[1024, 1, 224, 224] to have 3 channels, but got 1 channels instead

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[1024, 1, 224, 224] to have 3 channels, but got 1 channels instead #220

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError: Given groups=1, weight of size [1024, 3, 14, 14], expected input[1024, 1, 224, 224] to have 3 channels, but got 1 channels instead #220

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions