-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Description
When running the Sokoban game, if LLM issues an invalid move (for example, a push-up command with no box above), the run of the task appears to crash and terminate.
Is it possible to handle this error output, e.g., ignoring the wrong move and asking LLM to response again?
Here is an example of the last few lines of the log file, from an error output. In this case, the LLM intends to move right, but somehow outputs "push up" and seemingly causes the error.
## Task
Analyze the current board state (provided above), your previous action/thought AND reflection. Decide the single best action for the worker. Your goal is to push all boxes onto the designated dock locations.
**Rules:**
- You can **move** Up, Down, Left, Right onto empty floor or docks.
- You can **push** a box Up, Down, Left, Right if the space beyond it is empty.
- Avoid deadlocks (pushing boxes into corners unnecessarily).
**Instructions:**
1. Review the current board and your last action/thought.
2. Determine the next best action: `up`, `down`, `left`, `right` to **move**, OR `push up`, `push down`, `push left`, `push right` to **push** a box.
3. Briefly explain your reasoning.
Your response format MUST be:
thought: [Your reasoning about the game state and planned action. Refer to the CURRENT BOARD STATE below. Clearly state if moving the player or pushing a box. Consider potential deadlocks.]
move: [action]
Available Actions: "up", "down", "left", "right", "push up", "push down", "push left", "push right".
**Example:**
thought: Moving right to get behind the box.
move: push up
------------------------ END FINAL USER PROMPT ------------------------
Output (stderr):
UserWarning: Using SDL2 binaries from pysdl2-dll 2.32.0
X Error of failed request: BadRROutput (invalid Output parameter)
Major opcode of failed request: 139 (RANDR)
Minor opcode of failed request: 9 (RRGetOutputInfo)
Serial number of failed request: 437
Current serial number in output stream: 437
----- All game runs complete -----
Run: Game='sokoban', Harness='True' - FAILED
Summary: 0 successful runs, 1 failed runs.
Some runs failed. Check the logs in the 'logs' directory.
----- Main script finished. Check 'logs/' for details. -----
Metadata
Metadata
Assignees
Labels
No labels