Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video decoding errors both on CPU / CUDA backends. #592

Open
BernhardGlueck opened this issue Mar 24, 2025 · 5 comments
Open

Video decoding errors both on CPU / CUDA backends. #592

BernhardGlueck opened this issue Mar 24, 2025 · 5 comments

Comments

@BernhardGlueck
Copy link

🐛 Describe the bug

After diving into this for a few days, i still was not able to fix this:

Environment:
Fedora 41, Nvidia, Driver 570 with cuda support ( torch itself works fine in training with cuda )

Python 3.12.8
Torch: 2.6.0+cu126
TorchVision: 0.21.0
TorchCodec: 0.2.1+cu126
FFmpeg: 7.1.1 with cuda ( cuvid, nvenc, nvdec ) support

Dataset: UCF 101

Minimal Code:

import torch
from torchcodec.decoders import VideoDecoder

device = torch.device('cpu')
#device = torch.device('cuda:0')

 decoder = VideoDecoder(video_path, device=str(device), seek_mode="approximate")

for frame in decoder:
   print('Just for testing')

I get the following errors:

Using CPU device:
RuntimeError: Requested next frame while there are no more frames left to decode.

Using CUDA device:
RuntimeError: CUDA error: initialization error, (on core.add_video_stream)

The videos work ( UCF101 ) when decoding with ffmpeg directly fine.
Also This worked fine with a previous version,
And on Ubuntu i have the same issues.

Any ideas what's going on ?

Versions

Fedora 41, Nvidia, Driver 570 with cuda support ( torch itself works fine in training with cuda )

Python 3.12.8
Torch: 2.6.0+cu126
TorchVision: 0.21.0
TorchCodec: 0.2.1+cu126
FFmpeg: 7.1.1 with cuda ( cuvid, nvenc, nvdec ) support

(collectin_env.py crashes )

@NicolasHug
Copy link
Member

Hi @BernhardGlueck , thanks for the report.

Also This worked fine with a previous version,

Can you share on which previous version of torchcodec this was working?

Also to help reproducing, can you share the name/path of one of the videos from UCF101 that this fails on? ideally, a full minimal reproducing example would be greatly helpful!

Thanks

@BernhardGlueck
Copy link
Author

BernhardGlueck commented Mar 24, 2025

So i am trying to investigate myself a bit further...
I think we're talking about two seperate problems, let's look at the cpu side `first

import torch.cuda
from torchcodec.decoders import VideoDecoder

print(torch.cuda.is_available())

device = torch.device('cpu')

decoder = VideoDecoder("v_UnevenBars_g18_c04.avi",device=str(device))
for frame in decoder:
    print(frame.data.shape)

This works fine.

class VideoDataset(Dataset[tuple[torch.Tensor, torch.Tensor, torch.Tensor]]):
    def __init__(self,
                 root_dir: str,
                 class_mapping: ClassMapping,
                 fps: int,
                 max_frames: int,
                 device: torch.device,
                 max_items: int | None = None,
                 dtype: torch.dtype = torch.float,
                 video_extensions=(".mp4", ".avi", ".mov", ".mkv")):
        self.root_dir = root_dir
        self.class_mapping = class_mapping
        self.samples = get_videos_and_classes(root_dir, video_extensions)
        self.fps = fps
        self.device = device

        self.max_frames = max_frames
        self.max_items = max_items
        self.dtype = dtype

        if max_items is not None:
            self.samples = random.sample(self.samples, max_items)

    def __len__(self) -> int:
        return len(self.samples)

    def __getitem__(self, index: int) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
        class_name, video_path = self.samples[index]

        try:
            decoder = VideoDecoder(video_path, device=str(self.device), seek_mode="approximate")
            source_fps = decoder.metadata.average_fps
            sampled_frames = self._sample_frames(decoder, source_fps, self.fps, self.max_frames)
            zero_frame = torch.zeros_like(sampled_frames[0])

            actual_frames = len(sampled_frames)

            while len(sampled_frames) < self.max_frames:
                sampled_frames.append(zero_frame)

            mask = torch.arange(self.max_frames) < actual_frames
            label = self.class_mapping.one_hot(class_name)

            video_tensor = torch.stack(sampled_frames, dim=0)

            return video_tensor.to(dtype=self.dtype), mask, label.to(dtype=self.dtype)
        except Exception as e:
            print(f"Failed: {video_path}, {e}")
            raise e



    def _sample_frames(self, decoder: VideoDecoder, source_fps: float, target_fps: float, max_frames: int):
        step = source_fps / target_fps  # Compute step size
        sampled_frames = []

        current_index = 0  # Tracks the current frame index
        next_frame_to_sample = 0  # The next frame index to keep

        for frame in decoder:
            if current_index >= next_frame_to_sample:
                sampled_frames.append(frame)

                if len(sampled_frames) == max_frames:
                    break
                # Keep this frame
                next_frame_to_sample += step  # Update the next frame index to sample

            current_index += 1  # Always increment the frame counter

        return sampled_frames

data_loader = DataLoader(
            datasets[split],
            batch_size=5,
            shuffle=False,
            num_workers=1,
            pin_memory=True
        )

This fails when sampling the dataloader in the _sample_frames loop ... on the exact video file ( reproducible )
"Failed: /run/media/bglueck/Data/datasets/ucf_split/train/UnevenBars/v_UnevenBars_g18_c04.avi, Requested next frame while there are no more frames left to decode."

I attached the full project code files for your convenience ( its just a toy project and i am only testing the data loading right now )

Archive.tar.gz

@NicolasHug
Copy link
Member

I think the difference between the 2 code snippets above is the use of the "approximate" mode.

Can you try approximate mode outside of the VideoDataset logic and confirm whether you're seing the same issue?

@BernhardGlueck
Copy link
Author

Yes that was it... thank you, i was under the impression that should give me a performance improvement.
So that fixes the cpu decoding, the cuda side ist still an issue.

In the basic sample:

import torch.cuda
from torchcodec.decoders import VideoDecoder

print(torch.cuda.is_available())

device = torch.device('cuda:0')
decoder = VideoDecoder("/run/media/bglueck/Data/datasets/ucf_split/train/UnevenBars/v_UnevenBars_g18_c04.avi",device=str(device),seek_mode='exact')
for frame in decoder:
    print(frame.data.shape)

This works perfectly...

But in the training code:

  def _create_dataloader(self, split: str) -> DataLoader:
        device = self._get_module().device

        #device = torch.device("cpu")
        datasets = create_video_datasets(self.root_path, self.fps, self.max_frames, device, self.max_items, self.dtype,
                                         self.video_extensions)

        print(str(device))
        print(str(self.dtype))

        return DataLoader(
            datasets[split],
            batch_size=self.batch_size,
            shuffle=False,
            num_workers=1,
            pin_memory=False
        )

This works now when i override to cpu, but if i leave it like above
(prints cuda:0, and float32 )

this fails as before (but works on cpu now, since i removed the approximate seek mode )
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

@scotts
Copy link
Contributor

scotts commented Mar 25, 2025

Yes that was it... thank you, i was under the impression that should give me a performance improvement.

In some instances, yes. But approximate mode also relies entirely on the video's metadata for seeking. Approximate mode assumes a constant frame rate and accurate metadata. If either requirement fails, then you'll run into problems. See https://pytorch.org/torchcodec/stable/generated_examples/approximate_mode.html#which-mode-should-i-use for more.

Regarding the problem with CUDA, we don't have access to the rest of your code and environment. For further help, please narrow the problem down to a chunk of code you can show us in its entirety that exhibits the behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants