Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier way to download raw data #62

Open
BennoKrojer opened this issue Oct 29, 2024 · 1 comment
Open

Easier way to download raw data #62

BennoKrojer opened this issue Oct 29, 2024 · 1 comment

Comments

@BennoKrojer
Copy link

Hello,

I am wondering if there is any chance to make the dataset more easily accessible as a local download? Specifically just the raw videos and captions without difficult tfrecord parsing.

Thanks,
Benno

@BennoKrojer
Copy link
Author

I figured it out for now but somehow the captions simply do not match the videos and seem to be randomly assigned when I run this script:

import tensorflow_datasets as tfds
import cv2
import numpy as np
import os
import json

def decode_inst(inst):
    """Utility to decode encoded language instruction"""
    return bytes(inst[np.where(inst != 0)].tolist()).decode("utf-8")

def save_video(frames, video_path, fps=5):  # Lower FPS for slower playback
    height, width, _ = frames[0].shape
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    video_writer = cv2.VideoWriter(video_path, fourcc, fps, (width, height))

    for frame in frames:
        video_writer.write(frame)

    video_writer.release()

def process_dataset(dataset_path, output_dir, instructions_file):
    builder = tfds.builder_from_directory(dataset_path)
    episode_ds = builder.as_dataset(split='train')

    os.makedirs(output_dir, exist_ok=True)

    video_count = 0
    instructions = {}

    for episode in episode_ds:
        frames = []
        instruction = None

        for step in episode['steps'].as_numpy_iterator():
            frames.append(step['observation']['rgb'])
            if instruction is None:
                instruction = decode_inst(step['observation']['instruction'])

        video_path = os.path.join(output_dir, f'video_{video_count}.mp4')
        save_video(frames, video_path)
        instructions[video_count] = instruction
        print(f'Saved video {video_count} to {video_path}')
        video_count += 1

        # Save instructions to a JSON file
        with open(instructions_file, 'w') as f:
            json.dump(instructions, f, indent=4)
        print(f'Saved instructions to {instructions_file}')

def main():
    # Process language_table
    # process_dataset(
    #     'gs://gresearch/robotics/language_table/0.0.1/',
    #     './language_table_videos',
    #     './language_table_instructions.json'
    # )

    # Process language_table_sim
    process_dataset(
        'gs://gresearch/robotics/language_table_sim/0.0.1/',
        './language_table_sim_videos',
        './language_table_sim_instructions.json'
    )

if __name__ == '__main__':
    main()

Do you have any guess why the videos and captions do not match?
They make sense (i.e. normal language) but do not match:

{
    "0": "move the blue moon in between the yellow pentagon and green cube",
    "1": "move the blue cube into the bottom of the red pentagon",
...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant