Skip to content

A dataset of first-person monologue videos/transcript/annotations about "life lessons" in various domains. The main purpose is for multi-modal language analysis and modeling.

Notifications You must be signed in to change notification settings

owuQQQ/Life-lessons

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Life-lessons

A dataset of first-person monologue videos/transcript/annotations about "life lessons" in various domains. The main purpose is for multi-modal language analysis and modeling.

Meta data

Included in the folder metadata

Download videos/transcripts

Run the script dl_videos.py in folder dl_youtube, which is adapted from the repo video-processing

cd dl_youtube
python dl_videos.py --video_folder PATH_TO_OUTPUT

Step 1. Body key points extraction

Run the script extract_body_kp.py in folder processing to process multiple videos in batch:

python extract_body_kp.py --input_dir YOUR_INPUT_PATH --output_dir YOUR_OUTPUT_PATH

Or, you can change --input_dir to --input, and --output_dir to --output, if you just want to process a single video file.

Use the --annotate argument to generate an output video with all 33 key points annotated. For example, after you run

python extract_body_kp.py --input foo.mp4 --annotated

There will be a video file named foo_annotated.mp4 saved to the folder, which looks like this:

foo.mp4

The main output file is in .pkl format, which is a serialized object of type List[Tuple]. Each Tuple object has two items: (frame_index: int, poses: np.ndarray). Here frame_index is the index of video frame (from 0 to the maximum length -1) that contains a valid detected human body, so for frames where no human body is detected, their indices are skipped. poses is a numpy array of shape (33, 3), in which 33 is the number of key points, and 3 stands for the x, y, and z coordinates.

For detailed explanations on the format of the key points coordinates in the output, please refer to the APIs of mediapipe.

Step 2. Tokenize hand gestures (continuous position => discrete label)

Based on the key points coordinates returned from step 1, we can tokenize the hand gesture in each frame using the spatial distribution information.

Step 3. Sample gestures (word sequence => gesture sequence)

About

A dataset of first-person monologue videos/transcript/annotations about "life lessons" in various domains. The main purpose is for multi-modal language analysis and modeling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 88.3%
  • Jupyter Notebook 11.7%