-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Video Models Lack Termination Awareness: Post-Task Completion Instability
🐛 Potential Problem Report
Video generation models often correctly complete the reasoning task (e.g., sliding puzzle, object subtraction), but then continue generating unnecessary frames after the goal state has already been reached. This leads to visual instability, the appearance of extraneous elements, and divergence from the expected final state.
This suggests that current video models lack termination awareness — they do not know when the task is finished.
🔍 Observed Behavior
Symptoms
- Model completes the required reasoning steps correctly
- 2–5 seconds of extra content are generated after the solution
- Extra elements appear that are not part of the target final state
- Final frames flicker or become visually unstable
- The model does not recognize that the goal has already been achieved
Examples
1. Sliding Puzzle
Prompt:
"Complete this sliding puzzle by moving the numbered tiles to their correct positions. Only 1 move is needed. Slide the tile(s) horizontally or vertically into the empty space. Keep the camera view fixed and maintain the grid structure unchanged."
https://github.com/user-attachments/assets/f8bc9dc6-7f0a-4b7b-a1f1-f5ccc5654c7a (Veo3; prompted via web version)
Prompt:
"Complete this sliding puzzle in exactly 1 move. Move one tile per move horizontally or vertically into the empty space. Do not make extra moves. Keep the camera view fixed and maintain the grid structure unchanged."
https://github.com/user-attachments/assets/7fe62b0d-9b04-4b30-af5b-c8c7401840f2 (Sora2; prompted via the web version, which only supports landscape or portrait)
A very interesting case: the model explains the process like a teacher, yet still exhibits termination issues:
https://github.com/user-attachments/assets/8b19e839-6138-4d95-856a-513369277c00 (Sora2; prompted via the web version)
2. Tangram
Prompt:
"Move the blue triangle piece into the empty space to complete the square tangram. Keep all other pieces fixed."
https://github.com/user-attachments/assets/8bc783fe-10b3-4768-8697-e683fa62ac8a (Veo3; prompted via web version)
3. Object Subtraction
Prompt:
"Remove the 2 leftmost objects. Keep all other objects in their exact positions."
https://github.com/user-attachments/assets/d13e9b47-c7e5-481e-b6de-9167e9824414 (Veo3; prompted via web version)
https://github.com/user-attachments/assets/0ecaad40-fd0e-48d6-b3d1-c4485612920e (Sora2; prompted via the web version)
🤔 Potential Causes
- No explicit goal-state recognition mechanism
- Fixed-duration generation independent of task completion
- Missing prompt signals instructing models to stop
- Limited temporal reasoning for understanding what “completion” means
📊 Impact
If the model cannot keep the final frame at the moment the task is completed — but instead continues generating unnecessary frames — it becomes much harder to determine whether the model actually solved the task, and it undermines the reliability of last-frame judgment as an evaluation method.